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Abstract 

A review of the Loop Algorithm, its generalizations, and its relation to 
some other Monte Carlo techniques is given. The loop algorithm is a Quantum 
Monte Carlo procedure which employs nonlocal changes of worldline configura- 
tions, determined by local stochastic decisions. It is based on a formulation of 
quantum models of any dimension in an extended ensemble of worldlines and 
graphs, and is related to Swendsen-Wang algorithms. It can be represented 
directly on an operator level, both with a continuous imaginary time path inte- 
gral and with the stochastic series expansion (SSE). It overcomes many of the 
difficulties of traditional worldline simulations. Autocorrelations are reduced by 
orders of magnitude. Grand-canonical ensembles, off-diagonal operators, and 
variance reduced estimators are accessible. In some cases, infinite systems can 
be simulated. For a restricted class of models, the fermion sign problem can 
be overcome. Transverse magnetic fields are handled efficiently, in contrast to 
strong diagonal ones. The method has been applied successfully to a variety of 
models for spin and charge degrees of freedom, including Heisenberg and XYZ 
spin models, hard-core bosons, Hubbard, and t-J-models. Due to the improved 
efficiency, precise calculations of asymptotic behavior and of quantum critical 
exponents have been possible. 
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1 Introduction and Summary 

A pedagogical review of the loop algorithm, its generalizations, and the range of 
present applications is given, including some new results. The loop algorithm [1-4] 
is a Quantum Monte Carlo procedure. It is applicable to numerous models both in 
imaginary time worldline formulation [5] and within the stochastic series expansion 
(SSE) [6-8]. It overcomes many of the difficulties of traditional worldline simulations 
by performing nonlocal changes of worldline configurations, which are determined 
by local stochastic decisions. The loop algorithm is based on a formulation of the 
worldline system in an extended ensemble which consists of both the original variables 
(spins or occupation numbers) and of graphs (sets of loops), either on the level of 
matrix elements [1-3,9], or of loop-operators [10-14]. It is related to Swendsen- 
Wang [15] cluster algorithms for classical statistical systems. It has been applied and 
generalized by a large number of authors. Before we delve into technical details, let 
us summarize the main features. 

(a) Autocorrelations between successive Monte Carlo configurations are drastically 
reduced, thereby reducing the number of Monte Carlo sweeps required for a 
given system, often by orders of magnitude. 

(b) The grand-canonical ensemble (e.g. varying magnetization, occupation number, 
winding numbers) is naturally simulated. 

(c) The continuous time limit can be taken [16], completely eliminating the Trotter- 
approximation. In fact, the loop algorithm can be formulated directly in con- 
tinuous time. 

(d) Observables can be formulated in terms of loop-properties, as so called Improved 
Estimators, reducing the errors of measured quantities. 

(c) Off-diagonal operators can be measured through improved estimators [12]. 

(f) Transverse fields can be simulated efficiently [17-19]. 

(g) For a restricted class of models, including fermionic ones, it has been shown [18- 
32] that by clever use of improved estimators the sign problem can be overcome. 

(h) Bond disorder and depleted lattices can be trivially included. The algorithm 
remains completely unchanged in any dimension. Generalizations to higher spin 
representations [9,33-38], biquadradic interactions [14,39], and to fermionic 
models [40-45] exist. 

Each of the points (a)-(g) can save orders of magnitude in computational effort over 
the traditional local worldline method. In addition, the algorithm is easier to program 
than traditional worldline updates. The method has some limitations: 

(a) Long range interactions make the algorithm more complicated and less effective. 

(b) More seriously, strong asymmetries in the Hamiltonian will make the original 
algorithm exponentially slow. This includes large magnetic fields (or chemi- 
cal potential) with ph^S and other non "particle- hole-symmetric" terms like 
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softcore bosons. The difficulty disappears wfien sucfi a field can be put into 
transverse direction (section 4.4). Otherwise alternative methods are preferable 
(see section 5). 

Some of the usual limitations of worldline methods also remain in the loop al- 
gorithm. The most serious limitation remains (so far) the sign problem, which still 
occurs in most fermionic models as well as in frustrated spin systems. Further gener- 
alizations of the meron idea (section 4.8) may help here in the future. 

The loop algorithm has already been used for many physical models. The original 
formulation [1-4] of the algorithm (in vertex language) applies directly to general 
spin i quantum spin systems in any dimensions, e.g. the 2D Heisenberg model [46], 
where improved estimators for this algorithm were first used. At the root of the loop 
algorithm is an exact mapping of the physical model to an extended phase space 
which includes loops in addition to the original worldlines. In ref. [9] it was shown in 
a general framework that this mapping is a Fortuin-Kasteleyn-like representation. A 
related mapping to a loop-model was independently used in a rigorous study of spin 
models [10,11]. The algorithm was generalized to anisotropic XYZ-models [3], with 
exphcit update probabilities given in ref. [34] and in section 2.7. The method has been 
adapted and extended to fermion systems like the Hubbard model [18,40] and to the 
t-J model [41-44]. The meron method [20] to overcome the fermion sign problem in a 
restricted class of models was developed and also applied to non-standard Hubbard- 
like models [27-29,31], to antiferromagnets in a transverse field [18,19], and to a 
partially frustrated spin model [32]. The loop algorithm was extended to quantum 
spin systems with higher spin representation [9, 33, 35-38], also for the XYZ-case [34], 
and to cases with transverse fields [17]. The extension to more than (1+1) dimensions 
is immediate [1,2]: the algorithm remains completely unchanged, only the geometry 
of the plaquette lattice changes. In ref. [16] it has been shown that the continuous 
time limit can be taken, and in ref. [12] that any n-point function can be measured, 
with diagonal and off-diagonal two-point functions being especially simple. 

A related development along a somewhat different line are the "Worm" algorithm 
in continuous time [47], "operator-loop- updates" in SSE [13] and the recent method 
of "directed loops" [48], which are applicable to a larger class of models, especially 
with strong asymmetries. 

There have been many successful large scale applications of the loop algorithm, 
both in imaginary time, and, more recently, in a variant called "deterministic operator 
loops" [13,32] within the SSE formulation, to fermionic and especially to numerous 
Heisenberg-like models, for spin | and higher spins, with and without anisotropy, 
disorder, or impurities, from spin chains up to three dimensional systems, including, 
e.g., a high statistics calculation of quantum critical exponents on regularly depleted 
lattices of up to 20000 spatial sites at temperatures down to T = 0.01 [49-51]. 

Section 2 describes the algorithm in its traditional form, with a brief review of the 
worldline representation, an intuitive outline of the loop algorithm, and a detailed step 
by step formal derivation of the algorithm, followed by a brief summary. We compute 
explicitly the update probabilities for the XXZ- model, and give a concise recipe for the 
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Heisenberg antiferromagnet. Ergodicity is treated, it is shown that in some important 
transformation of the worldhnc model to a pure loop model can be done, the 
original single cluster version is treated, and arbitrary lattices are covered. The state- 
of-the-art continuous time version is described (including a brief recipe) in section 
2.13. In section 2.14 we introduce improved estimators, and in section 2.15 simulations 
on infinite size systems. In section 2.16 we discuss the performance of the loop 
algorithm, possibilities and limitations, and some implementation issues. 

In section 3 the operator formulation of the method is introduced, which provides 
an alternate derivation directly in continuous time, and also within the stochastic se- 
ries expansion, which is discussed there, including a description of the loop algorithm 
within SSE. 

Section 4 describes a number of generalizations, some of them immediate. Section 
5 mentions related algorithms, and section 6 points to some of the physics problems 
to which the loop algorithm has been applied so far. The appendix provides a pre- 
scription to ensure the essential (yet often neglected) requirement for correct Monte 
Carlo simulations that convergence and statistical errors are properly determined. 



2 Algorithm 

We begin with the traditional formulation of loop algorithm and loop representa- 
tion by way of a finite Trotter time worldline formulation. An alternative derivation 
on an operator level is provided in section 3, and is also applicable within the stochas- 
tic series expansion. 

The loop algorithm, as usually presented, acts in the worldline representation 
which is reviewed e.g. in ref. [5]. We will develop the formal procedure for the general 
anisotropic (XYZ-like) case. As an example we shall use the particularly simple 
but important case of the one-dimensional quantum XXZ model [5]. It includes the 
Heisenberg model and hard core bosons as special cases. We will see that the same 
calculation is valid for the loop algorithm in any spatial dimension and already covers 
most of the important applications. The simplest and most important case is the loop 
algorithm for the isotropic Heisenberg antiferromagnet, which will be summarized in 
section 2.8. 



2.1 Setup: Worldline representation and equivalent vertex models 

Let us first recall the worldline representation for the example of the XXZ model 
on a one-dimensional chain of N sites [5] . The Hamiltonian is 

= E(ij) {^{StS^- + SrS^) + jJ?S?) - hEiS? , 
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Figure 1: Example of a worldline configuration on a checkerboard lattice of shaded plaquettes. (a) 
Worldlinc picture, (b) The same configuration as a vertex picture. Space direction (index i) is 
horizontal, imaginary time direction (index I) vertical. The variables S^'i are defined on each lattice 
site. Worldlines (arrows upwards in time) denote 5?; = +5, empty sites (arrows downwards in time) 
denote Sfi = —h. The Hamiltonian Hi^i+i acts on the shaded plaquettes. 



where Si are quantum spin | operators at each site i, S^, S^ are the associated raising 
and lowering operators, h is a magnetic field, and (ij) are pairs of neighbouring sites. 
We use periodic boundary conditions (arbitrary ones are possible). 
After splitting the Hamiltonian into commuting pieces 



H 

Heven,odd 



Si: even,odd 



even ~l~ Hgdd 

Hi^i+1 , 



performing a Trotter-Suzuki breakup [52, 53] 



(2.2) 



rXXZ 



tr e 



hm Z--^ 

Lt—> 00 



lim tr ( e ^4"'^'"=" e '^f 



(2.3) 



and inserting complete sets of S^ eigenstates, we arrive at the worldline representation 



Zxxz ^ ^ ^ ^ X{W,{{S,}) , 



(2.4) 



{Sti} P 



where the summation J2{S'^} extends over all "configurations" S = {S^i} of "spins" 
Sfi — ±|, which live on the sites (i,/), i — 1..N, I — 1..2Lt, of a (l-l-l)-dimensional 
checkerboard lattice. The index I — l,..,2Lt corresponds to imaginary time. The 
product Yip extends over all shaded plaquettes of that lattice (see figure 1), and Sp 
stands for the 4-tuple of spins at the corners of a plaquette p — + 1, /), (i, / + 

1), (i + 1, / + 1) ). The weight Wp at each plaquette ^ 



(2.5) 



^We keep a plaquette index p with Wp to cover spatially varying Hamiltonians. 
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where Ar = P/Lf, is given by the matrix elements^ ^ oiH — {AT)Hi^i+-i: 



W{1+) 


= a+ 






e 


At j 


g-l- 2 " 


w(i-) 


= a_ 




-^1— ) = 


e" 


At t 
- — Jz 




iy(2±) 


= c 


^ (+-|e- 


-^1+-) = 


(-+|e-^|-+) = 


At t 
- — 


cosh(^ij,.i: 


PF(3±) 


= b 




-«|-+) = 


(-+|e-^|+-) = 


At t 


sinh(^|4|) 



. . (2-6) 
Since [ffj^j+i, S^g^ — 0, there are only the six nonvanishing matrix elements given 

in eq. (2.6), namely those that conserve 5"? + Sf_^-^, as shown pictorially in figure 2. 

Therefore, the locations of = +^ in figure 1(a) can be connected by continuous 

worldlines. The worldlines close in imaginary time-direction because of the trace in 

eq. (2.3). 

For models with fermions or hard core bosons one inserts occupation number 
eigenstates instead of 5"? . Nearest neighbor hopping then again leads to the six- 
vertex case [5] of figure 2. The term "worldline" derives from this case, since here 
they connect sites occupied by particles. 

We will find it useful to also visualize worldline configurations in a slightly different 
way, namely as configurations of a vertex model [54]. To do this, we perform a one- 
to-one mapping of each worldline configuration to a vertex configuration. We stay on 
the same lattice of shaded plaquettes. We represent each spin S^i by an arrow between 
the centers of the two shaded plaquettes to which the site {i, I) belongs. The arrow 
points upwards (downwards) in time for S^i = +|(— |)- The worldline-configuration 
in figure 1(a) is thus mapped to the vertex configuration of figure 1(b). The one-to- 
one mapping of the worldline- plaquettes is shown in figure 2. The conservation of 
S^^^ on each shaded plaquette means in vertex language that for each vertex (center 
of shaded plaquette) two arrows point towards the vertex and two arrows point away 
from it. If one regards the arrows as a vector field, then this means a 

condition "divergence = zero" for the arrows. (2.7) 

Note again that vertex language and worldline language refer to the same configura- 
tions; they differ only in the pictures drawn. 

We have now mapped the XXZ quantum spin chain to the six-vertex model of 
statistical mechanics [54], though with unusual boundary conditions, since the vertex 
lattice here is tilted by 45 degrees with respect to that of the standard six-vertex 
model. Let us look more closely at the case of vanishing magnetic field, h = 0. This 

^The notation a, b, c is standard for vertex models [54], the notation 1, 3, 2 (in different order) is 
that used in refs. [9,34,55]. 

^ The matrix clement b is originally proportional to sinh {—^Jx)- It is positive for ferromagnetic 
XY couplings < 0- For antiferromagnetic XY couplings > 0, the minus signs cancel on bipartite 
lattices. Equivalently, b can be made positive on a bipartite lattice by rotating S^'V ^—S^^v on one 
of the two sublattices. We have already assumed such a rotation in eq. (2.6). 
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Figure 2: Allowed plaquette configurations Sp. Top line: worldlinc picture. Bottom line: the 
same plaquettes in vertex picture. In the XXZ case (= six- vertex case), only the plaquettes with 
continuous worldlines, Sp = 1^,2=*=, 3^, have nonzero weight, eq. (2.6). In the anisotropic XYZ case 
(= eight-vertex case) the plaquettes 4=*= also have nonzero weight. 
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Figure 3: Graphical representations of plaquette breakups. Top row: worldline picture. Middle 
row: the same graphs in the vertex picture. A "breakup" specifies the direction in which the two 
loop segments that enter each plaquette will continue. This direction can be vertical, horizontal, 
or diagonal. Each graphical representation applies in general to two breakups "G*^", as denoted 
below the pictures. In the six- vertex-case, breakups G*"' do not occur, thus the three non-freezing 
breakups , i ^ j are one-to-one equivalent to the three graphical representations. Breakup "G'-'" 
is allowed (i.e. compatible with continuity of the arrow directions of the worldlines ) in plaquette 
configurations Sp = «^ or j"^ (see figure 2). Namely, fiipping the two spins on either one of the two 
lines in the graphical representation of G'-' , i ^ j, maps between configurations i and j. Breakup 
G'*, called "freezing", forces all four spins to flip together, thus mapping between and ir . For 
example, the diagonal breakup G^^ is compatible with plaquette configurations 1^ and 3^, which 
is most obvious in the vertex picture. Starting from, e.g., plaquette 1~ and flipping two diagonally 
connected spins results in plaquettes 3+ or 3~. 



model has been exactly solved in (1+1) dimensions [54]. The exact phase diagram is 
shown in figure 4, in terms of the plaquette weights given in eq. (2.6) and in figure 2. 

It is interesting to note where the couphngs of the Trotter-discretized XXZ-model 
at /i = are located in this phase diagram (see eq. (2.6)): For the Heisenberg antifer- 
romagnet Jz = \Jx\ > we have a + 6 = c, i.e. we are on the Kosterlitz-Thoulcss line. 
As Ar— i^O, we approach the point a/c = 1, 6/c = 0. For the Heisenberg ferromagnet 
Jz — —\Jx\ < we have a — b = c, i.e. we are on the KDP transition line, approaching 
the same point a/c — 1, b/c — ets At ^ 0. When \Jz\ < \Jx\, the same point is 
approached from inside the massless (XY-like) region. When jj^j > |Jj.|, it is ap- 
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Figure 4: Exact phase diagram of the classical six-vertex model [54] at /i = 0. The weights a, b, c 
are defined in eq. (2.6). Phase III is massless (infinite correlation length). At ^ + * = 1 there is 
a Kosterlitz-Thouless phase transition [56] to the massive (finite correlation length) phase IV, and 
at ^ — ^ = 1 there is a first order KDP phase transition [57, 58] to the ferroelectric phase I. The 
parameter regions of the Heisenberg model are denoted in brackets: Jz > \Jx\ (AF), < —\Jx\ 
(FM), and |Jz| < |Ja;| (XY-likc). The weights corresponding to the XY- model (or free fermions), 
i.e. Jz = 0, are located on the circle + = c^. When At 0, the point (a/c = 1,6/c = 0) is 
approached in all cases. 



proached from below the respective transition line, i.e. from the massive (Ising-like) 
phase IV when > (AF) and from phase I when < (FM). Note that the local 
couplings a,b,c do not change in higher dimensions (see section 2.12). 

Anisotropic XYZ case: For generality later on, let us briefly describe the anisotropic 
case without magnetic field, in which ^ Jy in the Hamiltonian 

H = E(ij) JxS?S? + JySfSy + J,S?SI . (2.8) 

We also get this case if we quantize the XXZ-model along an axis different from the 
z-axis. The treatment is the same as for the XXZ-model. Again we use 5*^ eigenstates 
to insert complete sets, and arrive at the following nonvanishing matrix elements on 
the (l+l)-dimensional checkerboard lattice. 



W{1^) 


= a 


{++\e- 




= (- 


-|e-^|— ) 


At t 
= e ""^"^ 


Ch(^(j, - Jy)) , 


W{2^) 


= c 


■■= {+-\e 




= (- 


-+|e-^|-+) 




ch(^(J. + J,)), 




= b 




-«|-+) 


= (- 


-+|e-^|+-) 


— e.+ 4 


sh(-^(J, + J,)), 




= d 


(++|e- 


-^1— ) 


= (- 


-|e-^|++) 


At t 

— e 4 -^^ 


Sh(-^( J, - Jy)) , 



(2.9) 



which reduce ^ to eq. (2.6) when J^ = Jy We see that now there is an additional 
type of vertex with weight d, shown as type 4^ in figure 2, in which all four arrows 
point either towards or away from the center. This vertex type may be thought of as 

^On bipartite lattices. See footnote 4. 
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a source (resp. sink) of arrows. Eq. (2.7) becomes the 

condition "divergence = zero mod 4" for the arrows. (2-10) 

The vertices and their weights now correspond to the eight-vertex model [54]. We 
will see that very little changes for the loop algorithm in this case [3]. 

2.2 Outline of the Loop Algorithm 

The traditional way to perform Monte Carlo updates on a worldline configuration 
[5,59] consists of proposing local deformations of worldlines and accepting/rejecting 
them with suitable probability. In contrast, the updates for the loop algorithm are 
very nonlocal. We will first describe the basic idea for the example of the XXZ 
case and outline the resulting procedure. We postpone the formal discussion and the 
calculation of Monte Carlo probabilities to the next section. ^An alternative derivation 
on an operator level is provided in section 3. 

Two observations lead to the loop algorithm: 

(1) The Hamiltonian acts locally on individual plaquettes. Thus the detailed 
balance condition on Monte Carlo probabilities can be satisfied locally on those pla- 
quettes. 

(2) The allowed configurations of arrows in the six-vertex-case have zero diver- 
gence, eq. (2.7). Therefore any two allowed configurations can be mapped into each 
other by changing the arrow-direction on a set of closed loops, where along each of 
these loops, the arrows have to point in constant direction. These arc the loops that 
are constructed in our algorithm. The path of the loops will be determined locally 
on each plaquette (see below). 

An example is given in figure 5. Note that the loops are not worldlines; instead 
they consist of the locations of (proposed) changes in the worldline occupation number 
(=arrow directions = spin directions). Also, the loops are not prescribed, instead 
they will be determined stochastically, with probabilities that depend on the current 
worldline configuration. Since both the zero divergence condition and detailed balance 
can be satisfied locally at the plaquettes, we will be able to construct the loops by local 
stochastic decisions on the plaquettes, yet achieve potentially very nonlocal worldline 
updates. 

Let us construct a loop (see figure 5). This is most easily done using the vertex 
picture, where the loop follows the arrows of the spin-configuration. We start at 

^Notation: From now on we will synonymously use "plaquette" or "vertex" to refer to the shaded 
plaquettes of the checkerboard lattice. We also use interchangeably the terms "spin direction", 
"arrow direction", and "occupation number" to refer to the 2 possible states S^^ at each site (il) 
of the checkerboard lattice. We denote both probabilites and plaquettes by the letter p. Sp and 
Wp are the spin configuration at plaquette p and its weight, and Gp will be a breakup at p. "Six- 
vertex-case" (= "XXZ-case" ) and "eight- vcrtcx-casc" will refer to the local plaquette constraints (i.e. 
nonzero weights), not necessarily to the respective models of statistical mechanics themselves. 
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some arbitrary site (il) of the plaquette lattice and follow the arrow of the current 
spin configuration into the next plaquette. There we have to specify the direction in 
which we will continue. For any allowed spin configuration (see figure 2) there are 
two possibilities to continue along an arrow. We choose one of these directions and 
follow the arrow into the next plaquette. Then we continue to choose a direction 
at each of the plaquettes which we traverse. If we reach a plaquette a second time, 
there is only one direction left to go, since the loop should not overlap itself to avoid 
undoing previous changes^. Eventually we will (on a finite lattice) reach the starting 
point again, thus closing the loop. If we flip all the arrows (=spins) along the loop, 
we maintain the continuity of arrows (worldlines) at each plaquette, and thus reach 
a new valid spin configuration. 

We can now start to construct another loop (not to overlap the first one), starting 
at some other arbitrary site which has not been traversed by the first loop. Note 
that by deciding the directions in which the first loop travelled, we have at those 
plaquettes also already determined the direction in which a second loop entering the 
same plaquettes will move, namely along the two remaining arrows. Thus, what we 
actually decide at each plaquette through which a loop travels, is a "breakup" of 
the current arrow configuration into two disconnected parts, denoting the paths that 
the two loop segments on this plaquette take. The possible breakups of this kind are 
shown and described in figure 3. For each of the six possible arrow configurations i^, 
figure 2, there are two breakups which are compatible with the constraint that the 
arrows along the loop have to point in a constant direction; namely those breakups 
labelled {i 7^ j) in figure 3. 

There is another kind of "breakup" that maintains condition (2.7). Here all 4 
spins on a plaquette are forced to be flipped together. We call this choice "freezing" 
(labeled G*' in figure 3), since for a flip-symmetric model hke the six- vertex model 
ai h — it preserves the current weight W{i^). Freezing G** can also be viewed as 
consisting of either one of the two breakups G^^ {j 7^ i), with the two loop segments 
on this plaquette being "glued" together. In this view freezing causes sets of loops 
to be glued together. We shall call such a set (often a single loop) a "cluster" . All 
loops in a cluster have to be flipped together. (For an alternative, see section 3.3.) 

Overall, we see that by specifying a breakup for every shaded plaquette, the whole 
vertex configuration is subdivided into a set of clusters which consist of closed loops. 
Each site of the checkerboard lattice is part of one such loop. We shall call such a 
division of the vertex conflguration into loops a "graph" G. Flipping the direction of 
arrows (= spin or occupancy) on all sites of one or more clusters of a graph (a "cluster 
flip" which consists of "loop flips"), leads to a new allowed conflguration. In the loop 
algorithm the loops are constructed by specifying breakups with suitable probabilities 
that depend on the current conflguration (see below). In the vertex picture, the graph 
G resides on the same arrows as the spins. In the worldhne picture, the elements of 
G look slightly differently, as seen in flgures 5 and 3 . Note that by introducing loops, 

''Removing this constraint and allowing the loop to move in any direction eventually leads to the 
"directed loops" discussed in section 5.3 ! 
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Figure 5: Example of a loop update, (a) Worldlinc picture. The thick solid line denotes a single 
worldline, the dot-dashed line depicts a possible loop. By "flipping the loop", i.e. flipping the spin 
direction (arrow direction) on all sites along the loop, the worldline will be deformed into the dashed 
shape. Since the loop can potentially be very large, this deformation can be very nonlocal, (b) The 
same situation as a vertex picture. The loop is represented by the thick arrows. By construction, 
in the vertex picture each loop follows arrows of the spin-configuration. This means that the loop 
runs upwards in time-direction along sites with spin = +5, i.e. along worldlines, and downwards 
in time-direction where there is no worldline. 



we have effectively extended the space of variables, from spins, to spins and breakups. 
This point will be formalized in the next section. 

In summary, the basic procedure for one Monte Carlo update then consists of two 
stochastic mappings: First from spins to spins plus loops, and then to new spins. I.e., 
starting with the current configuration of worldlines: 

(1) Select a breakup (i.e. specify in which direction the loops will travel) for each 
shaded plaquettc with a probability that depends on the current spin configu- 
ration at that plaquette. These probabilities are discussed below. Identify the 
clusters which are implicitely constructed by these breakups. This involves a 
search through the lattice. 

(2) Flip each cluster with suitable probability, where "flipping a cluster" 
means to change the direction of all arrows along the loops in this cluster (or, 
equivalently, changing spin direction or occupation number, respectively). The 
combined cluster flips result in a new spin conflguration. The flip probabilities 
depend in general on the Hamiltonian and on the current spin conflguration. 
In the ideal case, for example the isotropic Heisenberg model in any dimension, 
each individual loop can be flipped independently with probability 1/2. 

An example is given in flgure 6. Notice that in this example the flip of a loop which 
happened to wind around the lattice in spatial direction led to a change in spatial 
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worldlines worldlines + loops new worldlines 




breakups flips 



Figure 6: Example of a worldline update with the loop algorithm. For clarity, we show a situation 
with only two worldlines. We start with the worldline configuration S in the left picture. The 
stochastic breakup decision on each plaquette, with probabilities depending on this worldline config- 
uration, defines loops (in general clusters of loops), only two of which are drawn. In this example we 
then flip both loops, i.e. flip the spin direction (= worldline occupancy) along the loops. This results 
in the new worldline configuration S' in the right picture, which, as in this example, can be very 
different from the original one. Since one of the loops happened to wind around the lattice in spatial 
direction, its flip produced a worldline configuration with nonzero winding number. Note that for 
the next worldline update, the current loop configuration is discarded, and a completely new set of 
breakups will be determined with probabilities depending on the new worldline configuration. 

winding number of the worldline configuration, i.e. an update that cannot be done by 
local deformations of worldlines. 

Little changes in the general XYZ-like (eight- vertex- like) case [3]. The loops now 
have to change direction [3] at every breakup of type (i,4). Alternatively, one can 
also omit assigning a direction to loops. 

Let us now cast the general ideas into a valid procedure. Sections 2.3 to 2.6 are 
formal and comprehensive, with detailed explanations. A summary is given in section 
2.6, and explicit weights for XXZ and XYZ cases in section 2.7. A recipe for the most 
important (yet particularly simple) case, the Heisenberg antiferromagnet, is provided 
in section 2.8. See also sections 2.13 and 3.6. Previous formal expositions can be 
found in the original loop algorithm papers [2, 3] (the best formal description there is 
that for the eight- vertex case in ref. [3]), as well as, in a general setting and in a more 
suitable language closer to the Fortuin-Kastclcyn mapping of statistical mechanics, 
in the papers by Kawashima and Gubernatis [9, 34]. We shall use both the worldhne 
picture and the vertex picture of ref. [2, 3], in order to provide a bridge between the 
existing formulations and to make the simple geometry of the problem as obvious as 
possible. In section 2.10 we point out that for many models it is possible to sum 
over all spin variables to obtain a pure loop model. We then cover the continuous 
time limit. Finally, we introduce improved estimators, the single cluster version, and 
describe the performance of the loop algorithm. 
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2.3 Kandel-Domany framework 

A brief overview of the basics of Monte Carlo algorithms is given in Appendix 
A. The derivation [2, 3] of the loop algorithm is similar to that for the Swendsen 
Wang cluster algorithm in statistical mechanics [15] which uses the Fortuin-Kasteleyn 
mapping of the Ising model to an extended phase space. (For an excellent review see 
e.g. ref. [60]). A general formalism for such a mapping was given by Kandel and 
Domany [61]. Here we use a more suitable language similar to that of the general 
framework of Kawashima and Gubernatis [9], who made the Fortuin-Kasteleyn- like 
nature of the mapping obvious. 

For future use, we first write down the general scheme, without yet making refer- 
ence to individual spins, loops, or plaquettes. We start with a set {S} of configurations 
S and a set {G} of graphs G, which together constitute the extended phase space. 
The partition function 

z^T.ns) (2.11) 

s 

depends only on <S. In addition we now choose a new weight function W{S,G) which 
must satisfy 

EcWiS^G) = W{S), 

W{S,G) > 0. ^^-^^^ 
Thus we have a Fortuin-Kasteleyn-like [62,63] representation: 

Z^J:T.W{S,G) (2.13) 

S G 

A Monte Carlo update now consists of 2 steps: 

i) Given a configuration S (which imphes W{S) ^ 0), choose a graph G with 
probability 

ii) Given S and G (this implies W{S^ G) 7^ 0), choose a new configuration (5', G') 
with a probability p ( (5, G) {S', G') ) that satisfies detailed balance with re- 
spect to W{S,G): 

W{S,G)xp{{S,G)^{S',G')) = W{S',G')xp{{S',G')^{S,G)) , (2.15) 
for example the heat-bath-like probability 

p ( (5, G) ^ (5', G') ) ^ ^(g,G)X^GO + c.^^ • 
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Then the mapping 5 — > <S' also satisfies detailed balance with respect to the original 
weight W{S). Proof: 

W{S)p{S^S') = W{S) EG,G'PiS^{S,G)) p{{S,G)^{S',G')) 

= W{S) ^a,G' ^ P ( (5', G') (5, G) ) 

= W{S') Eg,g' P ( (S', G') (S, G) ) (2.17) 

= W{S') Eg,g' P{ S' ^ (5', GO ) P ( (^5', GO ^ (5, G) ) 

= W{S') p{S'^S) . 

(Within a Monte Carlo simulation the denominators in eq. (2.17) cannot vanish.) 



2.4 Exact mapping of plaquette models 

We apply the Kandel-Domany formalism to a model defined on plaqucttes, with 
W{S) = A,i,US) X W-^'««(5), 

To cover the general case, we have split off a global weight factor Agiobai- This split 
is not unique. ^Wc devise an algorithm for W^''"''^{S) = YlpWp{Sp). Because of its 
product structure, we can perform the decomposition into graphs separately on every 
plaquette. Thus in analogy with eq. (2.12) we look for a set of "breakups" Gp and 
new weights Wp{Sp, Gp) on every plaquette p which satisfy 

EG,Wp{Sp,Gp) = Wp{Sp), 

Wp{Sp,Gp) > 0. ^^-'^^ 
This imphes eq. (2.12) again, both for the plaquette part 

WP^-^{S) = n E ^pi^P^ = E n ^MSp, Gp) ^ E W^'^^S, G) (2.20) 

p G, yj^G, P G 

(where G = UpGp, W^^'^««(cS, G) = Hp Wp{Sp,Gp)) and for the total weight W{S) = 
Eg ^('5, G) with 

W{S,G) = Agi,US)y<WP^^'^{S,G) 

= Agiobai{S)xnpWp{Sp,Gp) . ^^-^'^ 

Thus we can apply the Kandel-Domany procedure. Restricting ourselves to G' = G, 
the two steps i), ii) in the previous section now become the procedure for the loop 
algorithm. Starting with a configuration 5, a Monte Carlo update consists of: 



*For most of the subsequent discussion, we will implicitcly assume that the global weight can be 
factorized into independent contributions from different clusters. 
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(1) Breakup: For each plaquette, satisfy eq. (2.14) by choosing Gp with probabihty 

p(S.^(S.G,))^5^i^. (2.22) 

(2) Fhp: Choose a new configuration (S', G) with a probabihty p ( (S, G) — > (5', G) ) 
that satisfies detailed balance with respect to W{S, G). 

In the next section we shall explicitely find a suitable set of breakups Gp and 
plaquette weights Wp{Sp, Gp). 



2.5 Structure of plaquette weight functions 

Given a graph G, we demand that VF^'"^(<S, G) does not change 

W^^^^S, G) = G) (2.23) 

upon any spin update allowed by VF^'"^(»S', G) ^ 0. Then it cancels in eq. (2.16), 
which can now be written as 

^global[.>J) -r 2^S'^S,W{S',G)^0^global\<-> ) 

The configurations <S' for which W{S', G) will be those reached by cluster fiips. 
By enforcing eq. (2.23), all cluster flips will become independent of each other, up to 
acceptance with Agio^ai- Eq. (2.23) is equivalent to 

W^^^^iS, G) = A(5, G) V{G) , A(5, G) := | J| ^^erwfsf ^ ' ^^.25) 

which is the form introduced in ref. [9]. Thus 

W{S, G) = A(5, G) V{G) Agi^US)- (2.26) 

We shall achieve eq. (2.23) by enforcing it on every plaquette: 

Wp{Sp,Gp)^Wp{S'^,Gp). (2.27) 

Then eq. (2.26) also holds on the plaquette level: Wp{Sp,Gp) = A{Sp,Gp)V{Gp). 
The nontrivial part in this point of view is that all allowed plaquette updates Sp^S'p 
match for different plaquettes, to give an overall allowed update S ^S'. As we have 
seen in section 2.2, it is the six- (or eight-) vertex constraint, stemming from local 
conservation of (or mod 2) in the Hamiltonian, that makes these plaquette 
updates match in the form of loops. In other words, by enforcing eq. (2.27), we will 
achieve that all clusters (sets of loops that are glued together at frozen plaquettes) 
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constructed during the breakup-step can be flipped independently, up to acceptance 
with pfiip, eq. (2.24). 

Let us now find weights satisfying eq. (2.27). Independent cluster fiips require 
that eq. (2.27) at least include the case S'p = Sp, where all four spins at a plaquette 
are fiippcd, Wp{Sp, Gp) = Wp{Sp, Gp), which implies the requirement 

Wp{Sp) = ^ Wp{Sp, Gp) = ^ Wp(Sp, Gp) = WpiSp) (2.28) 

Gp Gp 

on the plaquette weights Wp{Sp). The first step in our construction is therefore to 

Choose Agiobai such that Wp{Sp) = Wp(Sp) . (2.29) 

Such an Agiobai always exists. It is not unique. The ideal case is Agio^ai — const, since 
then for each cluster, pfup = | can be chosen. See also section 4.3. 

For worldline models, there are a total of eight allowed spin configurations Sp = 
= 1±, 2±, 3±, 4±, as shown in figure 2. With eq. (2.28), the plaquette weight Wp{Sp) 
depends only on i. Following ref. [3], let us 

Define a different "breakup" Gp := G''^ = G^^ for every transition i ^ j , (2.30) 

such that the breakup G'^^ allows exactly the transitions i ^ j. Thus we define 

W,(S,M'):=[f' l!:^ = (2.31) 

with suitable constants w'^^ = w^^. We have satisfied eq. (2.27) by construction. By 
inspection of figure 2 we see that every transition i ^ j, i 7^ j, corresponds to the 
flip of 2 spins on a plaquette (all four spins for i'^ <-> i~). 

We also see by inspection of figure 3 that, given the current worldline configuration 
Sp = , we can identify each of the 4 breakups G'^^ , j — 1,2,3,4, with one of the 
graphs in figure 3. Namely, fiipping 2 of the spins connected in the graph for G*-', 
i 7^ j, leads to one of the two plaquette configurations j^, fiipping the other two 
spins leads to the other configuration, fiipping all four spins maps from i"^ to i^. 
For an example, see the figure caption. Therefore, given a worldline configuration, 
the combined breakup G = [jpGp can be represented^ as a graph consisting of the 
plaquette-elements in figure 3. In many cases we can also transform the worldline 
model entirely into a loop graph model; see section 2.10. 

Since Gp connects pairs of sites, the breakups of all plaquettes will combine to give 
a set of clusters consisting of loops, as already described in section 2.2. When there 
is no freezing, i.e. no l)reaku])s (7** occur, then all clusters consist of single loops. 

'^In general we should distinguish between the breakups G^^ , of which there are 6 (10) in the six 
(eight) -vertex case, and the fewer (4) graphical representation in figure 3. Kawashima has shown 
that one can also give a common graphical representation of for all (ij) [34]. This representation 
requires more than one loop-element per site. 
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We still need to find constants w^^ = i,j — 1, 2, 3, 4, such that the constraint 
eq. (2.19) is satisfied, which now reads ^° 

' (2.32) 

w'^ > 0, 

(with Vr(4) = in the six- vertex-case). This constraint underdetermines the w'^K 
There are 3 equations for 6 unknowns in the six-vertex case, and 4 equations for 10 
unknowns in the eight-vertex case. It can always be solved. One explicit solution is 
the following: Let W{k) be the smallest of the n weights W{j), j = 1, .., n (n is 3 or 
4). Eq. (2.32) is satisfied by 

w'^ = W{k)/n for i 7^ j , 

= W{i) - Ej^i w'^ for i = 1, .., n . ^ > 

Experience tells us that for an efficient algorithm, one should keep the loops as 
independent as possible. Thus we should minimize the weights w** which cause loops 
to be glued together. Let W{1) be the largest of the n weights W{j). Given a solution 
w'^^ we can always find another one in which no diagonal element w** except at most 
-u;" is nonzero [34] . For example, to remove w^^ ,j > 1 , define 

w'^'^ = 

w'^'^-i =w'^-'^'^ ^w^-^'^ + (2.34) 
Iterating this transformation leads to the one surviving diagonal element 



J2w^\ (2.35) 



More explicit solutions are given in section 2.7. 



2.6 Summary of the loop algorithm 

Since the detailed derivation of the general formalism was a bit tedious, we sum- 
marize the actual procedure here. Start with a model in worldline representation with 
Z^j:sW{S),eq. (2.11). 

(1) Choose a split W{S) = Agiobai{S) x UpWp{Sp), eq. (2.18), such that Wp{Sp) = 
Wp(Sp), eq. (2.29). 

(2) Find new weights w^^ = w^^ > such that X^jW*-^ = W{j), eq. (2.32), while 
preferrably minimizing the "freezing" weights w'*, see eq. (2.34). (See also 
section 3.3). 

i°Eqs. (2.31), (2.32) are eqs. (15), (16) in ref. [3]. 
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Each Monte Carlo update from a worldline configuration to a new one then involves 
the following steps: 

(i) (Breakup) For each shaded plaquettes, with current spin configuration i"^, choose 
a breakup G*-' with probability p = w'^^ /W{i), proportional to the graph weight 
w'^. (See eqs. (2.22,2.31) and section 2.7). 

(ii) (Cluster identification) All plaquette breakups together subdivide the vertex 
lattice into a set of clusters, which consist of closed loops. Loops which have 
a frozen vertex ("G**") in common belong to the same cluster. Identify which 
sites belong to which clusters. (This is in general the most time consuming 
task). 

(iii) (Flip) Flip each cluster separately^^, one after the other, with (e.g.) heat-bath 
probability for Agioj^ai- In case Agi^bai = 1, this means that one can flip each 
cluster independently with probability |. "Flipping" means to change the sign 
of Sfi on all sites in the cluster. If desired, one can artificially restrict the 
simulation to some sector of phase space, e.g. to the "canonical ensemble" 
of constant magnetization, by prohibiting updates that leave this sector, or one 
can select such sectors a posterior [64,65]. (See also section 4.3). 



2.7 Graph weights for the XXZ, XYZ, and Heisenberg model 

We now come back to our example and compute [2, 3] one solution for the weights 
w^^ = and thus the breakup and flip probabilities, for the spin-flip symmetric 
six- vertex case, with weights a,6,c, eq. (2.6). This includes the Heisenberg model 
and the XXZ-model at h = (eq. (2.6)) in any dimension (see section 2.12). Some 
solutions for the general XYZ model are also given. We need to flnd a solution to eq. 
(2.32). Here it reads^^ 

a ~ i-^Jz 

c ~ 1 (2.36) 

From figure 3 we see that w^^, w^^, and w^^ correspond to vertical, horizontal, and 
diagonal breakups, respectively. The weights w** correspond to transitions i^^i^, 
i.e. to flipping zero or four spins on a plaquette. They freeze the value of the weight 
W{i). Experience tells us that we should minimize freezing in order to get an efficient 

^^Alternatively one can perform a combined flip of a randomly chosen subset of clusters. However, 
when Agiobal is not unity, this will in general produce bigger variations of Agiobal and therefore lower 
acceptance rates. 

^^We have multiplied the weights in eq. (2.6) by exp {—AtJz/4:) and also provided the expansion 
to order At for later use in the continuous time version. 



,11 




+ w^^ 


= W{1) = 


,22 






= W{2) = 


,33 


+ W^^ 


+ w'^^ 


= W{3) = 
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algorithm, in which then loops are as independent as possible. We will construct 
solutions with minimal freezing; others exist. See also sections 2.16 and 3.3. 

Eq. (2.36) has different types of solutions in different regions of the parameter 
space (a,6, c). Remarkably, these regions are exactly the same [2,3] as the phases 
of the two-dimensional classical six- vertex model [54,57,58], shown in figure 4. The 
regions of figure 4 have been spelled out in terms of the coupling constants Jx,Jz at 
the end of section 2.1. 

Region IV (AF), has antiferromagnetic couplings Jz > \Jx\, thus c > a + b. To 
minimize the freezing of weight c, we have to minimize w'^'^. From eq. (2.36), w'^'^ = 
c — a — b + w^^ + w^^ + 2w^^. With w*-' > this imphes w^^'""" = c — a — b. This 
minimal value of w^^ is achieved for w^^ — w^^ — 0, i.e. when we minimize all freezing. 
The optimized nonzero parameters for region IV are then: 

= a ~ 1 — ^ (vertical breakup), 

w^^ ^ b ~ ^\Jx\ (horizontal breakup), (2.37) 

= c — a — b ~ ^[J^ — \J^\) (freezing of opposite spins) , 

without any diagonal breakups. This has to be modified for non-bipartite lattices; 
see section 2.9. For an alternative to freezing, see section 3.3. 

In region I (FM) with ferromagnetic couplings < —\Jx\ and a > b-\- ewe get 

— c ~ 1 (vertical breakup), 

w^^ — b ~ I Jj-I (diagonal breakup) , (2.38) 

w^^ — a — c — b ~ ^ {\Jz\ — \Jx\) (freezing of equal spins) , 

without any horizontal breakups. ( This is similar for region 11, b > a + c, which 
does not correspond to a quantum model. There we obtain minimal freezing from eq. 
(2.37) with indices 2 and 3 interchanged, and no vertical breakups.) 

Region III (XY-like) has \Jz\ < \Jx\, and a, 6, c < ^{a + b + c). Here we can set all 
freezing probabilities to zero, obtaining 

2w^^ = a + c-b ~ 2-^{\Jx\ + Jz) (vertical breakup), 

2^23 ^ c + b-a ~ ^{\Jx\ + Jz) (horizontal breakup), (2.39) 

2w^^ — b + a — c ~ ^ {\Jx\ — Jz) (diagonal breakup) . 

The isotropic Heisenberg model is located on the boundaries of region III. The anti- 
ferromagnet Jz — \Jx\ has c — a — b — 0, thus only vertical {w^'^) and horizontal {w'^'"^) 
breakups. The ferromagnet Jz — —\Jx\ obeys c — a + b — and has only vertical 
(w^^) and diagonal (w^^) breakups. There is no freezing for the isotropic model. 

The classical Ising model is reached in the limit Jx/Jz = 0, since then 6 = 0, so that 
there is no more hopping and all worldlines are straight. Remarkably, in this limit 
the loop algorithm becomes [55] the Swendsen-Wang cluster algorithm [15] ! Frozen 
plaquettes connect the sites of clusters in the Ising model, i.e. they correspond to the 
"freezing" operation [61] of the efficient Swendsen-Wang method. 
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The classical BCSOS model is simulated for a = 6 [1]. When a = h = \c, the loop 
algorithm constructs [1] the boundaries of the clusters which the VMR-cluster algo- 
rithm [66-70] for the (l+l)-dimensional BCSOS model produces, i.e. it constructs 
these clusters more efficiently. The loop representation was used in ref. [71] to obtain 
exact analytical results for this model, and in ref. [72, 73] to study the roughening 
transition of the BCSOS model. 

General XYZ case: (See also ref. [34] for explicit solutions.) The loop algorithm 
remains unchanged in the XYZ case (see section 2.2), except that at breakups G*'^, 
the arrows flip direction. In each of the four ordered regions of the XYZ model we 
have W{m) > J2i^m W{i) for one m G {1, 2, 3, 4}. The nonzero breakup weights with 
minimum freezing are then 

This also summarizes the solutions for regions I, II, IV above. In the disordered region 
2W{m) < J2i W{i) we can set all freezing w^^ to zero, and in general still have 6 free 
parameters w*-' with only 4 constraints eq. (2.32). 



(2.40) 



2.8 Recipe for the spin | Heisenberg antiferromagnet 

In order to make the loop algorithm as clear as possible, we restate the procedure 
for the important yet simple case of the isotropic spin | Heisenberg antiferromagnet. 
See also section 2.13, for the continuous time version, and section 3.6 for the SSE 
variant. 

A Monte Carlo update leads from a worldlinc configuration S of spin variables 
S?i — ±1 to a new configuration S'. On each shaded plaquette p, the local spin 
configuration Sp takes one of the six possibilities shown in the left part of figure 2, with 
weights Wp{Sp) given in eq. (2.6), satisfying a+b = c in the isotropic antifcrromagnetic 
case. The weights w^^ in eq. (2.37) are all zero except for w^'^ = a,w^^' = b, so that 
we get only vertical (G^^) and horizontal (G^^) breakups. The update consists of the 
following steps: 



(i) For each shaded plaquette, choose the horizontal breakup with probability 



r 0, i± 



tanh(^J) 5p = 2±, 
1, Sp — 3^ , 

(2.41) 

(sec cqs. (2.6,2.22,2.31,2.37)), otherwise choose the vertical breakup. 

(ii) Identify the clusters constructed in step (i). Since there is no freezing here, all 
clusters consist of single loops. 
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(iii) Flip each loop with probability |, where flipping means to change the sign of 
Sfi on all sites along the loop. This gives the new configuration S'. 

This procedure is even simpler than local worldline updates. Moreover, it remains 
completely unchanged in arbitrary dimensions (see section 2.12). The single cluster 
version is described in section 2.11. Note that one can and should avoid the Trotter 
approximation altogether, by working directly in continuous time, for which a mod- 
ification of this recipe will be given in section 2.13. or by using the stochastic series 
expansion, described in section 3.6. 

2.9 Ergodicity 

To estabhsh correctness of the loop algorithm, we still have to show ergodicity 
for the overall algorithm, including the existence of global configuration changes, 
Ergodicity is obvious when all w^^ > for i ^ j, and when Pfup is always nonzero 
(which is normally the case when we use eq. (2.24) for PfUp). Any two allowed 
configurations (i.e. W{S) ^ 0) are, as always, mapped into each other by a unique 
set of spin-flips (loop-flips), which are compatible with a set of breakups G"^, i ^ j. 
With w*^ > 0, this set of breakups has a finite probability to occur, and with pfUp > 0, 
the two configurations will be mapped into each other in a single Monte Carlo step 
with finite probability. Note that the trivially ergodic case w/-' > can always be 
constructed, as seen in eq. (2.33); this may not be an efficient algorithm, however. 
On the other hand, one can always construct weights w'^^ such that (for J^y 0) 
ergodicity is not achieved, for example by choosing w*^ = SijW{i), i.e. only freezing. 

When some of the w^^ vanish, ergodicity has to be shown case by case. With 
the weight choices of section 2.7, region 111 is trivially ergodic. We have to show 
ergodicity explicitely in each of the regions I, II, IV, because some w"^^ vanish there. 

Region I (including the Heisenberg FM): w'^^ = 0, i.e. there are only vertical and 
diagonal breakups (see figure 3). These breakups i)ennit a loop configuration which 
is identical to any given worldline configuration. That loop configuration will occur 
with finite probability. Flipping all loops in this configuration leads to the empty 
worldline configuration. Conversely, any worldline configuration can be generated 
from the empty one in a single (!) update by such a choice of loops. Therefore the 
algorithm is ergodic, mapping any two worldline configurations into each other in 
only two steps. 

Region IV (including the Heisenberg AF): w^^ = 0, i.e. there are only vertical and 
horizontal breakups. On a bipartite lattice with open or periodic spatial boundary 
conditions, ergodicity can be shown easily. Start with any worldline configuration 
«5 = {Sxi}- Our reference configuration this time is not the "empty" configuration 
S'^i = — |, but instead the staggered configuration S'^i = (— 1)'^(— |), i.e. the config- 
uration with straight worldlines on one of the two sublattices. As always, there is a 
unique set of loops whose fiips will map S into S'. By inspection we see that these 
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loops contain only vertical and horizontal breakups (horizontal where S' has diago- 
nal worldline parts, vertical elsewhere). Since these breakups have finite probability 
to occur, the whole set of loops will be constructed with finite probability. Thus, 
again, any worldline configuration will be mapped to the reference configuration with 
finite probability, and vice versa, so that on a bipartite lattice the algorithm is er- 
godic. Furthermore, on any lattice, the loop algorithm is at least as ergodic as the 
algorithm with conventional local updates. The latter consist of spin-changes around 
non-shaded plaquettes, equivalent to the flip of a small loop with two vertical and two 
horizontal breakups, which will occur with finite probability in the loop algorithm. 

Note that for a frustrated antiferromagnet, i.e. on a non-bipartite lattice, the 
algorithm eq. (2.37) with only horizontal breakups is not ergodic [42]: Loops switch 
time-direction at every breakup, thus a loop with an odd number of spatial hops is 
not possible. To ensure ergodicity, one has to include diagonal breakups with some 
weight < w^^ < b. Then ergodicity is trivial, since all w''^ > for i 7^ j. Eq. (2.36) 
is now solved with w^^ — b — w^^ and demands freezing w'^^ — 2w^^ of equal spins. 
The size of w^^ has to be chosen for optimal performance of the algorithm, which, 
however, is subject to a severe sign problem. 

For completeness, we mention region 11 (which does not occur in worldline models). 
Here there is no vertical breakup. In case of periodic spatial boundary conditions, 
interchange of "space" and "time" leads us to the situation of region I, for which we 
have already shown ergodicity. 

2.10 Transformation to a pure loop model 

Remarkably, by using the exact mapping eqs. (2.12,2.19) on which the loop algo- 
rithm is based, we can transform quantum spin and particle models into pure loop 
representations, i.e. into a completely different setting than the original worldlines. 
This is analoguous to the Fortuin-Kasteleyn representation [62, 63] of the Potts model. 
It was first achieved, independently, by Nachtergaele and Aizenman [10,11] for the 
one-dimensional Heisenberg model, and was used to prove rigorous correlation in- 
equalities [10]. Kondev and Henley used it to compute the exact stiffness and critical 
exponents of a twodimensional vertex model [71, 74]. See also section 3. 

We get to a loop model by explicitely summing over the spin degrees of freedom 
in eq. (2.13). Using eqs. (2.13,2.21,2.25) we see that 

Y.W{S) = Y.I[T.^iSp,G,)V{G,)A,i,,US). (2.42) 

{5} {S} P Gp 

The condition A{Sp, Gp) 7^ restricts the graph G to consist of clusters, i.e. divergence- 
free components compatible with the spin configuration. 

The summation over spin configurations S can easily be done if a reference spin 
configuration Sq exists (see also section 2.9), in which all plaquette breakups Gp with 
nonvanishing weight V{Gp) are allowed (i.e. Wp{Spo,Gp) > whenever V{Gp) > 0). 
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Then aZ/ graphs with finite weight can be constructed from 5*0 . By design, cluster flips 
do not change W{S, G) when Agiobai{S) is constant. Each cluster can then be flipped 
independently and contributes a factor 2. With the weight choices from section 2.7, 
we see that for the AF region IV, the antiferromagnetically staggered configuration 
is such a reference configuration on bipartite lattices: it allows all relevant breakups 
(vertical, horizontal, freezing of unequal spins) on any plaquette. For region I (FM), 
the ferromagnetic spin configuration serves the same purpose on any lattice. 

In these regions (as well as in region II), we can then sum over spin configurations 
in eq. (2.42). Without external weight Agiohai, we get 

^= E fn«^'') 2^^^""^ = Y.W{G), (2.43) 

where Nc{G) is the number of clusters in G. When Agiobai is a product over contribu- 
tions from each cluster (e.g. in case of a magnetic field), the factor 2 for each cluster 
is replaced by {Agiobai{S{G)) + Agiobai{S' {G))) . 

When there is no reference configuration, e.g. for region III (XY-like) or for an- 
tiferromagnets on non-bipartite lattices, or for different choices of breakup-weights, 
we can still obtain a pure loop model. Now there can be clusters which do not corre- 
spond to a continuous worldline configuration (i.e. the spin directions mandated by 
independently chosen breakups at different plaquettes within a cluster may contra- 
dict each other). To remove graphs containing such clusters, we temporarily endow 
each loop with a direction, and introduce a constraint into the sum over graphs in 
eq. (2.43) enforcing compatibility of the loop directions of each cluster. 

The mapping generalizes immediately to the anisotropic XYZ-like case, where 
the number of a priori breakup-possibilities per plaquette doubles, though they are 
graphically still the same as in the XXZ-like case (see figure 3). 

We have thus mapped all XYZ-like quantum spin and particle models, with any 
choice of breakup weights and in any dimension, to a loop model, in complete analogy 
with the Fortuin-Kasteleyn mapping [62, 63] of statistical mechanics. This mapping 
is useful for analytical purposes (see above). Note that for the Heisenberg antiferro- 
magnet, the loop model consists of antiferromagnetic selfavoiding polygons, and for 
the Heisenberg ferromagnet it has a similar graphical representations as the world- 
lines themselves. Note also that for a given physical model there are many different 
loop models, corresponding to the different possible choices of breakup-weights. Re- 
markably, the graph-decomposition eq. (2.19), and thus the transformation to a loop 
model, can also be written on an operator level [12] (see section 4.6). All observables 
can be measured in the loop representation [12], as correlation functions ("improved 
estimators", see sections 2.14, 4.6) or as thermodynamic derivatives. 

One can perform Monte Carlo simulations purely in the loop representation, ana- 
loguously to Sweeny's method for the cluster representation of the Ising model [75]. 
Indeed, closer inspection reveals that the Handscomb method for the ferromagnet is 
equivalent to a Monte Carlo in loop representation with stochastic series expansion 
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[76-81] ! For other models this has not yet been tried (but the method of section 

4.8 comes close). In more than one spatial dimension, it is computationally more 
difficult than the loop algorithm with graphs and spins, since one has to keep track of 
the number of clusters, which can require traversing complete clusters for each local 
update, unless it can be simplified by a binary tree search. 

2.11 Single-Cluster Variant 

As in Swendsen-Wang Cluster updates, there are several ways to perform an up- 
date of the worldline configuration S with the rcqiiired detailed balance with respect 
to W{S,G), eq. (2.15). There are two important approaches: 

(i) Multi- Cluster Variant: Determine the whole graph (set of loops) G and flip each 
cluster (set of mutually glued loops) in G with suitable probability (see section 
2.6) 

(ii) Single- Cluster Variant (Wolff-cluster) [1,82] Pick a site {io,lo) at random, and 
construct only the cluster which includes that site. This can be done iteratively, 
by following the course of the loop through (ioJo) until it closes, while deter- 
mining the breakups (and thus the route of the loop(s)) only on the plaquettes 
which are traversed. This corresponds to our initial loop-description in section 
2.2. At each plaquette at which a frozen breakup G** is chosen, the current loop 
is glued to the other loop traversing this plaquette. That other loop (and any 
loops glued to it) then also has to be constructed completely. Flip the complete 
cluster with probability pjHp satisfying detailed balance with respect to Agiobai 
to get to a new spin configuration. Note that when Agiobai — const, we can 
choose pfiip — 1 instead of |. 

Both approaches satisfy detailed balance in eq. (2.15). One may think of the single- 
cluster variant as if all clusters had actually been constructed ffist, and then one of 
them chosen at random, by picking a site, to make an update proposal. 

The advantage of the single-cluster variant [60, 82, 83] is that by picking a random 
site, one is likely to pick a large cluster, whose flip will produce a big change in 
the configuration and thus a large step in phase space. This can reduce critical 
slowing down still further. The effort (computer time) to construct the single-cluster 
is proportional to its length. Normalized to constant effort, one finds indeed that 
the single-cluster variant (and the corresponding "Wolff-algorithm" for Swendsen- 
Wang-like algorithms) usually have even smaller dynamical critical exponents (see 
appendix B) than the multi-cluster variant. Note that improved estimators get a 
different normalization in the single-cluster variant (see eq. (2.57) and below). In some 
circumstances, the multi-cluster variant can still be advantageous overall, for example 
when employing parallel [84-86] or vectorized [87] computers. It is also essential for 
the meron method (section 4.8) and e.g. four-point functions as improved estimators 
(section 4.6), and it is easier to implement in continuous time. 
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The single-cluster method, on the other hand, generalizes into the worm and 
directed-loop methods discussed in section 5, which are applicable to any discrete 
model. 

2.12 Arbitrary spatial dimension 

There is vitually no change algorithmically in going to higher dimensions [1], if 
one chooses to stay on a vertex-lattice. Let us look at two spatial dimensions as an 
example. The even/odd split of the Hamiltonian in eq. (2.2) can be generahzed to 

H^Y.H.^E12Hi,i+o (2.44) 

with a separate for each direction of hopping (resp. spin coupling) in the Hamilto- 
nian H. For a two-dimensional square lattice with nearest neighbor hopping we thus 
get 4 parts H^, each the sum of commuting pieces living on single bonds, in complete 
analogy with the one-dimensional case. 

After the Trotter-Suzuki breakup, eq. (2.3), these single bonds again develop into 
shaded plaqucttcs. Each Trotter timeslice now has 4 subslices. Locally on each 
shaded plaquette we have the identical situation as in (1+1) dimension. Thus the 
loop algorithm can be applied unchanged. The only thing that changes is the way 
that different plaquettes connect. (Thus it is easy to write a loop-cluster program 
for general dimension. This contrasts with the traditional local worldline updates, 
where a number of different rather complicated updates are necessary [88] to achieve 
acceptable performance). The same construction can be applied as long as the lattice 
and the Hamiltonian admit a worldline representation in which commuting pieces of 
the Hamiltonian live on bonds. 

2.13 Continuous time 

As one of the most important developments. Beard and Wiese [16] have shown 
that within the loop formulation, one can directly take the time continuum limit 
Ar— >-0 in the Trotter-Suzuki decomposition, eq. (2.3). In fact, it turns out that one 
can also write the original spin model in operator language, directly in continuous 
time when H has a countable basis (see section 3) [10, 89, 90]. 

In continuous time it is appropriate to describe worldlines by specifying the times 
t at which a worldline jumps to a different site. This jump is now instantaneous. 

Let us look again at the simple case of a spin | antiferromagnet. Figure 7 shows 
part of a worldline configuration. In discrete time, this picture would be subdivided 
into plaquettes of temporal extent At, like figure 1. On each such plaquette, the 
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Figure 7: Loop construction in continuous time. Thick solid lines denote worldlines; the thick dashed 
line is a partial loop. 



probability of a horizontal breakup is given by eq. (2.41) 

0, or 2 worldlines on plaquette , 
tanh(^ J) 1 straight worldhne , (2.45) 

1, worldline jumps 



p(horiz. breakup) = < 



Now look at a specific lattice bond, e.g. (jk) in figure 7. For any time interval in which 

the plaquette occupation on this bond does not change, the breakup probability is 
constant. Then the breakup probability per time has a continuous time limit, i.e. it 
becomes a constant probability density. For lattice bond (jk) in figure 7 this is for 
example the case between times ti and 

p(horiz. breakup) at-^ J 
At ~^ 2 

Between times t2 and t^, the probability for a horizontal breakup on this lattice bond 
from eq. (2.45) is zero. On plaquettes without horizontal breakup, there is a vertical 
one, which means that loops will just continue in imaginary time without a jump. A 
third case occurs on lattice bond (ij) at time ^3. Here the probability for a horizontal 
breakup from eq. (2.45) is 1. 

The same pattern holds for the general case: All breakup probabilities on plaque- 
ttes are either 0, 1, or proportional to At (for small At), and thus have a continuous- 
time limit. Note that the probabihty densities are generated by the order At of pla- 
quette weights. Thus they contain matrix elements of the Hamiltonian (or parts of 
it), and no longer the exponential of Ti.. 

The multi-loop algorithm, summarized in section 2.6, therefore obtains a modified 
breakup-step in continuous time. For each lattice bond: 

(a) Identify each region of imaginary time in which the worldline configuration on 
this bond does not change. Randomly assign horizontal or diagonal breakups 
there with constant probability density, given by the continuous time limit of 
eq. (2.22). 
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(b) At times t where a worldline jumps across the lattice bond, there will be a 
non- vertical breakup with probability one. For example, in case of the Heisen- 
berg antiferromagnet, this will always be a horizontal breakup. In case of the 
XY-model, eq. (2.39) implies probabilty | for both horizontal and diagonal 
breakups. 

The rest of the algorithm remains unchanged. The technical implementation does 
however change completely. In continuous time, one can no longer store plaquette 
configurations. Instead, worldlines are specified by the events at which a worldline 
jumps to a neighboring site. It is useful to store a doubly linked list of such events for 
each site, with each list-item containing pointers to the preceeding and to the following 
event on the same site, and a specification of the nature of the event including its 
time t. (For the isotropic ferromagnet one needs only a singly linked list since all 
movement is forward in imaginary time). The breakup step can then be performed 
for each lattice bond, by following the lists for the corresponding two sites. Breakups 
can, e.g., be inserted as a different kind of event into the same lists, or be stored in 
separate lists. Identification and flipping of clusters then involves manipulations of 
these linked lists. 

The single-loop variant (section 2.11) can also be performed in continuous time. 
Instead of deciding breakups bond by bond, we now follow an individual loop-end 
along sites. 

Choose a site j at random. Start a loop at an arbitrary time ti, moving (e.g.) 
upwards in time. An example is shown in figure 7. The loop construction iterates the 
following procedure: 

Determine the time interval ti < t < t2 (equations are specified for moving up- 
wards in time) during which the worldline occupation of the neighboring sites does 
not change. Technically this can be implemented by using linked lists of events as 
above, and including additional pointers for each event, e.g. to preceding events on 
neighboring sites. For each such neighbor k draw a random number Tjk from the 
distribution Aj^exp {—^jk^jk), where Xjk — Jjk/'^ is the corresponding breakup prob- 
ability density. Now move the loop-end on site j up to time ti + min^ (rjk) and let 
it jump to the corresponding site k there^^ .The situation is the same as in radioac- 
tive decay, with decay constants Xjk and neighboring sites corresponding to decay 
channels. 

Moving in time-direction on site i corresponds to choosing a vertical breakup for 
all infinitesimal "plaquettes" connecting site i to its neighbors. When Jjk corresponds 
to a horizontal breakup, the loop-end will move in the opposite time-direction at the 
new site. When Jjk corresponds to freezing, the loop branches and becomes a cluster 
of loops, as usual. 

In the standard loop-algorithm, loops are non-self-overlapping and correspond 
to those of the pure loop-representation of the simulated model (sections 2.10 and 

^^Alternatively one can use the sum of the rates Xij to determine a transition time, and then 
decide where to move, according to the ratios of the . 
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3). Thus in the above construction one has to exclude those temporal regions of 
neighboring sites which have already been visited by the single loop (resp. cluster). 
This constraint is removed for the so-called directed loops, section 5.3. 

In case that all transition times ti + Tjk exceed t2, the loop-end stays at site j and 
moves to time t2- If the worldline at site j jumps at time t2, the loop must also jump, 
with the same probabilities as in the discrete time formulation. 

Now iterate this procedure. Eventually the loop closes and can be flipped as 
usual. 

We see that in continuous time, the single loop method is technically more involved 
than the multi-loop algorithm. It also lacks some of the improved estimators of 
the multi-loop method (see sections 4.6,4.8). On the other hand it tends to have 
still better autocorrelations. It can also be generalized to the directed-loop method 
(section 5.3). 

The continuous time limit has several important advantages over the discrete time 
case. It removes completely the systematic error from the Trotter breakup, thus also 
removing the cumbersome need for calculations at several values of At in order to 
extrapolate to At = 0. In addition, worldlines are specified much more economically 
by just specifying their transition times. This helps especially at low temperatures, by 
strongly reducing the storage requirements for a simulation. Longer range couplings 
imply larger sets of neighbors j to treat at each step. This is still cumbersome, but 
more economical than introducing extra Trotter shces. The advantages of the loop 
algorithm are preserved. 

Other approaches which work without Trotter approximation are the worm algo- 
rithm (section 5.1) and the stochastic series expansion (SSE) (sections 3.5,5.2, as well 
as directed loops (section 5.3). 

2.14 Improved Estimators 

In addition to the reduction of autocorrelations, the combined representation eq. 
(2.13) allows a potentially drastic reduction of statistical errors by using so-called 

improved estimators [42,46,75,82,83,91]. The Monte Carlo procedure provides us 
with a series of configurations Si. For each such configuration, we construct a graph, 
with some Ui clusters. We can then reach any state in a set J-'i of 2"' worldline 
configurations by fiipping a subset of the clusters. The probabihty for each of these 
configurations is determined by the cluster fiip probabilities Pfup. In the loop algo- 
rithm one configuration »Si+i will be chosen randomly according to these probabilities 
as the next Monte Carlo configuration. The standard thermal expectation value of 
an observable O is calculated by averaging over the value of the observable in the 

^''Note that when the transition probabihty per time is constant, the stochastic determination of 
a transition time can be interrupted and iterated arbitrarily without affecting the outcome. Thus 
the interruption at t2 is allowed. 
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configurations Sf. 

{0)^Y.^{S,). (2.47) 

i 

An improved estimator witli {O) = (Oimpr) can formally be defined as a weighted 
average of O over the states S' that can be reached from S with any vahd Monte 
Carlo procedure. With the loop algorithm wc sum over the 2"^' states S' & Ti. Since 
this is a sum over many states, it has reduced variance. Ideally, it can be calculated 
completely and then depends only on the graph G 

where we have used eq. (2.26). Note that Oimpr{G) is the representation of O in the 
pure loop model (see sections 2.10, 4.6). Alternatively, the sum in eq. (2.48) can be 
evaluated stochastically, 

Oimpr{S,G) = Pg{S ^ S') 0{S') , (2.49) 

where poiS — > S') is any probability that satisfies detailed balance for Agiobai (and 
thus for W{S, G)); it need not be equal to the actual update probability. Usually it 
is a product over suitably chosen cluster flip probabilities PfUp. We need to calculate 
this average over 2"' states in a time comparable to the time needed for a single 
measurement. Fortunately that is often possible. 

Especially simple improved estimators can be found in the case that pflip = | for 
all clusters. Then eq. (2.49) simphfies to 

since all of the states in jFj now have the same probability 2~"'. In order to achieve 
Pfiip = I when there is a nontrivial global weight Agiobai, we can use a probability 
function Pg{S — >• S') that has some clusters fixed in a certain state, and then has 
a new fiip probability of | for all other clusters. There are many possibilities. One 
can for example fix a cluster in its present state with Metropolis probability, or one 
can [42] fix the state of a cluster with probability = |2pfiip — 1|, in its present 
state if pflip < |, and in its flipped stated otherwise. The improved estimators then 
contain the usual contributions from both states of the non-flxed clusters, as well as 
contributions from the constant state of the fixed clusters [42] . In the following we 
assume for simplicity that no clusters are fixed. 

Let us calculate some useful improved estimators. Consider as an example the 
spin correlation function (S^S^) between two spins at spacetime sites j = (r, r) and 
k = (r',r'). Since each spin can be in one cluster only, the improved estimator eq. 
(2.50) is 



4(^1^. 



_ j CTjCrfc, if j and k are in the same cluster, 

j^khmpr = I (1 _ 2pflipj) (1 - 2pflip,k) Tjak, Otherwise, 

(2.51) 
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where aj^k — il are the current values of the worldhne variables (+1 for a worldline, 
— 1 for an empty site). In case pmp = |, the improved estimator is extremely simple: 

A(QZQz\ _ ) <^j<^ki if sites j and k are in the same cluster, (o 
^bj^khmpr - I Otherwise. 

We see that the calculation of improved estimators of correlation functions requires 
even less effort than for non-improved estimators. Remarkably, the spin-spin corre- 
lation function corresponds to the size distribution of the clusters. In general one 
can compute n-point Greens functions, including off-diagonal ones, as improved es- 
timators from the geometric properties of the clusters (section 4.6). Note that for 
uniform correlations of the the Heisenberg FM and for staggered correlations of the 
Heisenberg AF, we always have (TjUk — +1 in eq. (2.52). 

The potential gain from using the improved estimator is easy to see when (TjCTk — 1- 
The expectation value is the same as for the unimproved estimator (JjCJk = =tl- When 
(O) is small (e.g. {O) ~ exp {—R/^) at large distance R — |r — r'|), then the variance 
of C is 

(O^) - {Of = 1 - {Of ^ 1 , (2.53) 
whereas the variance of Oimpr is 

{^Impr) ~ {^imprf — {^impr) ~ {^imprf ~ {^impr) = {^) ^ 1 ■ (2-54) 

For a given distance R, this gain is largest at small correlation length ^, whereas 
the gain from reducing autocorrelations with the loop algorithm is largest at large 
^. On the other hand the non-improved estimator can have a sizeable amount of 
self-averaging at small ^, so that the gain from using improved estimators as just a 
measurement tool can be moderate in practice. See, however, sections 4.6 and 4.8 for 
other drastic effects of using improved estimators. 

Especially simple estimators can also be derived for magnetic susceptibilities. The 
uniform magnetic susceptibility at vanishing magnetic field can be expressed as the 
sum over all correlation functions: 




where V is the spatial volume (number of sites), M — 2dLt is the number of time 
slices in d dimensions, and S^^^ — This simphfies [46] in the XXZ case, by using 

E]^E^rV= E E j^^lr-l E MC) (2.56) 

■J" (clusters c) ((r,T)in c) clusters c 

to the sum of the square of the temporal winding numbers Wt(c) of the clusters c: 

E Mcf) (2.57) 

clusters c / 
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In the single-cluster variant (section 2.11), the sum over the clusters in eq. 2.57 is also 
calculated stochastically. Since we pick a single cluster with a probability where 
|c| is the cluster size and MV is the number of sites in the space-time lattice, we 
have to compensate for this extra factor and obtain (%) = ^ (^^j^ Wt{c)'^'^ . Similarly, 
eq. (2.52) implies that for an antiferromagnet with only horizontal breakups, the 
staggered susceptibility corresponds to the sum over the squares of all cluster sizes. 

Further improved estimators can be constructed, including cases with a sign problem 
[42]. Moreover, it was discovered [20] that by clever use of the improved estimator 
for the fermion sign, one can perform fermionic simulations in a restricted class of 
models without sign problem; see section 4.8. 

2.15 Infinite Lattices and Zero Temperature 

The existence of a single cluster version (section 2.11) and of improved estimators 
for two-point functions which have support only on individual clusters, allows for a 
surprising variant of the loop algorithm, namely for simulations on borderless lattices, 
which implies infinite size (L = oo) and/or exactly zero temperature {(3 = oo), while 
the simulation itself remains imchanged. Evcrtz and von der Linden showed [92] 
that one need only iteratively repeat the construction of a single cluster with fixed 
geometrical starting point, within a spin background of unlimited size which gets 
updated iteratively when the clusters are flipped. As iterations proceed, the single 
cluster updates will thcrmalizc the surroundings of the starting point, up to further 
and further distance (with probability proportional to the two-point function). Thus 
the two-point function of the infinite size system becomes available, converged to 
farther disctances in space and/or imaginary time, as the computation proceeds. The 
infinite size data can be used as the asymptotic point in Finite Size Scaling. It is 
especially valuable in systems for which the finite size behaviour is not known [93, 94]. 
The calculation of correct error bars for the resulting two-point function needs special 
care [92]. For a given distance, the computational time is, as usual, by far dominated 
by measurements, not by thermalization. 

This method works whenever the two-point function drops sufficiently quickly, so 
that the corresponding susceptibility is finite, e.g. in quantum spin systems with a 
gap. We note that such a parameter range away from a phase transition is often the 
region of interest when comparing simulation results to experimental measurements. 

As an example we show results for a spin ladder system [95] with N — 2 and 
4 legs for the isotropic antiferromagnet (A = 1). The left side of figure 8 provides 
results for the equal time staggered spatial correlation functions along the chains. 
A fit to the infinite lattice result gives ^ = 2.93(2) ior N ^ 2 and ^ = 8.2(1) for 



The Loop Algorithm 



34 



1 

0.1 
0.01 
0.001 
0.0001 
1e-05 
1e-06 
1e-07 



L=10 


20 




L=oo * ' ^ J ^ 



5 10 15 20 25 30 35 




30 
25 
20 
15 
10 
5 





N=2 




■ N=4 (times 10) 



0.2 0.4 0.6 0.8 1 1.2 1.4 



1 

0.1 
0.01 
0.001 
0.0001 
1e-05 
1e-06 
1e-07 



L=20 


4.0 




l_=oo '^^ 



1 

0.1 
0.01 
0.001 
0.0001 
1e-05 
1e-06 





20 











5 10 15 20 25 30 35 40 45 



2 4 6 8 10 12 14 16 18 



Figure 8: Left: Equal time staggered spatial correlation function of isotropically coupled Heisenberg 
antiferromagnetic chains, a,t q± = n and (3 = oo, for N=2 (top) and N=4 chains (bottom). Center: 
Greens functions {S{q, 0) S{q, t)) at q~ (tt, tt) for L — oo, the infinite size system, with N = 2 (top) 
and = 4 (bottom). Results aX, L — 20, /3 = oo have been added to exemplify finite size effects. 
Right: Spectrum S{q,Lu) ai q— (tt, tt) for L = oo and f3 = oo. From ref. [92]. 



N = 4. The center part of figure 8 sliows greens functions for L = oo, the infinite size 
system. Whereas finite temperature calculations give results periodic in imaginary 
time, which have to be extrapolated, this approach provides the /3 = oo (T = 0) result 
directly. A fit to the exponential decay G{t) ~ e~'^^ directly provides estimates for 
the gaps A = 0.5059(4) at = 2 and A = 0.19(1) at iV = 4, consistent with previous 
calculations. Results for L = 20 and P = oo are also shown, to exemphfy the effect of 
finite size systems. 

Continuing the imaginary time greens function to real frequencies by the maximum 
entropy method provides the spectra on the right side of figure 8, in which the gaps, 
the single magnon peaks, and higher excitations for = 4 are clearly visible. 



2.16 Performance 

The most important advantages and limitations of the loop algorithm have already 
been summarized in the introduction. Let us be more explicit here. Further aspects 
of the performance are mentioned in the following sections. 

Autocorrelations: The biggest obstacle which the loop algorithm addresses are 
the long autocorrelation times of worldline algorithms with local updates, as discussed 
in the appendices (see eq. (B.ll)). They require a proportional increase in computer 
time, so that simulations for large systems and/or low temperatures quickly become 
impossible. The loop algorithm appears to remove these autocorrelations completely 
in many cases (without magnetic field), like the spin | Heisenberg AF in any di- 
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mension, the two-dimensional spin | XY-model, and the spin 1 Heisenberg chain. 

For large systems and low temperatures this can save many orders of magnitude in 
computer time. As one striking example, see the gain in autocorrelation time for the 
one-dimensional Hubbard model in figure 11 in section 4.7. 

Autocorrelations and critical slowing down have been carefully determined in the 
original loop algorithm paper [1] for the nonquantum six-vertex model, with the 
single-cluster variant of the loop algorithm. In [96], a related study was done in 
which spatial winding was allowed to vary, with similar results for autocorrelations. 
In the massless phase (infinite correlation length) at - = - = \/2, the loop algo- 
rithm completely eliminates critical slowing down, i.e. the autocorrelation times are 
small and constant. The dynamical critical exponent of the Monte Carlo method 
(see appendix B) was z^f' ~ for all measured quantities, and = 0.19(2) (or 
logarithmic dependence). On the KT transition line, the exponential autocorrelation 
times are slightly larger (up to 20 on a 256^ lattice), with z^^ = 0.71(5), yet for 
the integrated autocorrelation times, which are relevant for MC errors, we saw barely 
any autocorrelations in either case, up to the largest lattices of size 256^. Note that 
thus the dynamical critical exponent for the integrated autocorrelation time is zero 
here, different from that for the exponential autocorrelation time. Local updates, 
in contrast, indeed showed very long autocorrelation times, and z^'^' = 2.2(2), as 
expected. 

Other studies have also seen very small integrated autocorrelation times for quan- 
tum systems, not significantly increasing with L or f3, with both the single- and the 
multi-cluster version of the loop-algorithm for Heisenberg spin | systems in Id, 2d 
and on bilayers [97], a spin-1 ladder [98], and for a t-J chain [42]. Note that away from 
a critical point, integrated autocorrelation times can even decrease with increasing 
system size, due to self-averaging of observables [97]. 

Strong fields (resp. chemical potentials) can however seriously impair the perfor- 
mance. They are discussed in section 4.3 and 4.4. See also section 5.3. 

Improved Estimators: The use of improved estimators (section 2.14) provides 
additional gains. For example, in ref [99] it has been possible to calculate the spin- 
spin correlation function (which in standard updates has large variance) down to 
values of 10 

Change of global quantities: Since the loops are determined locally by the breakup 
decisions, they can easily, "by chance", wind around the lattice in temporal or in 
spatial direction. An example is given in figure 6. The flip of such a loop then changes 
a global quantity (magnetization, particle number, spatial winding number). (Of 
course one can also choose to restrict the simulation to part of the total phase space, 
e.g. the canonical ensemble by not allowing such flips). This kind of conflguration 
change is virtually impossible with standard local methods. It has been used to 
investigate e.g. the KT transition in the quantum XY model [100, 101]. 

Freezing: For the loop algorithm itself, apart from effects of global weights, 
models which require flnite freezing weights could potentially be difficult. The 
intuitive argument can easily be understood. If two different loops meet at a "frozen" 
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plaquette (i.e. one for which the breakup G** was chosen) , they are glued together. If 
this happens at overly many plaquettes, then the cluster of glued loops which must 
be flipped together can occupy most of the lattice. The flip of such a cluster is not an 
effective move in phase space. It is basically equivalent to flipping all of the (few !) 
spins outside of that cluster. As an example, in ref. [1] we also investigated versions 
of the loop algorithm in which w^^ was (unnecessarily !) chosen finite. Sizeable 
autocorrelations were the result. Minimal freezing, on the other hand, appears not 
to be a problem. (Ref. [102] includes freezing but also a large magnetic field). As 
an example, note that as mentioned in section 2.7 [55], the limiting case J^^y 
of the loop algorithm is the classical Swendsen-Wang cluster algorithm, in which 
"freezing" is the only operation. Yet this cluster algorithm also drastically reduces 
critical slowing down in the corresponding classical models. More general cases with 
minimal freezing have apparently not been tested. For alternatives to freezing see 
also sections 3.3 and 5.3. 

Implementation: Implementation of the loop algorithm in imaginary time is 
actually considerably easier than for local updates, which, especially in more than 
one dimension, require rather complicated local updates [88]. 

In section 2.13 it was explained how the time continuum limit Ar — can be taken 
immediately in the loop algorithm, eliminating the Trotter approximation, reducing 
storage and CPU-time, and thus further extending the accessible temperature range. 
The implementation within the stochastic series expansion appears to be even more 
efficient, because of the discrete time-like variable used there. See section 3.6. 

The loop algorithm can be vectorized and parallelized similarly to the Swend- 
sen Wang cluster algorithm (see e.g. [84,87]). A vectorized version was used in ref. 
[100, 101]. Vectorization or parallehzation of the breakup process is trivial. The com- 
putationally dominant part is to identify the resulting clusters. This is equivalent to 
the well know problem of connected component labeling. See, e.g., ref. [103]. The 
optimal strategies are different from the Swendsen Wang case, because loops are lin- 
ear objects. Efficient parallelization has been discussed by Todo [85,86]. Each of 
Np nodes processes a slice of imaginary time of thickness P/Np, identifying the loops 
that close withing a slice. The remaining unclosed loops are merged gradually by 
combining adjacent pairs of slices and iterating this process in a binary tree fashion, 
which produces only logarithmic overhead. 

3 Operator formulation of the loop algorithm 

A simple and straightforward derivation of the Loop Algorithm can be given di- 
rectly on the operator level, instead of working on the level of matrix elements, as 
we have done so far. We will rewrite the Hamiltonian of our standard example, 
the Heisenberg XXZ model, in terms of loop- operators, which are equivalent to the 
breakups introduced previously. 
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We can then directly write tr as a continuous time path integral over loop- 
operators and spin variables (worldlines). Alternatively, we can express tr er^^ with 
the stochastic series expansion (SSE) [6-8], arriving at a version of the complete loop 
algorithm within SSE. 

Various parts of this formulation have appeared in the literature in different guises, 
especially in the operator formulation (on matrix element level) by Brower et al. [12], 
in the independent work by Aizenman and Nachtergaele on the Heisenberg model 
[10,11], in Sandvik's work on the interaction representation [104] and on "operator 
loop updates" for the isotropic AF [13], in the work by Harada and Kawashima [14], 
and in connection with the meron approach [27-29,31]. 

3.1 Isotropic Antiferromagnet 

The Hamiltonian of the spin | Heisenberg XXZ model on a single lattice bond 
{ij), without field, is given by eq. (2.1) 

Hi^ = US^S^ + SfSf) + J,S^S]. (3.1) 

For the isotropic antiferromagnet Jx — Jz — J > 0, we rewrite 

— jHij + I = —SiSj + i 

= ;^(in)-|iT)) ;^({n|-{iTI) . 

This operator acts towards the right, except for the third line, where we have given a 
worldline-like picture for illustration, to be interpreted as an operator acting towards 
the bottom. We have added a constant ^ to eliminate the contributions of parallel 
spins. We see that the bond operator Hij of the Heisenberg antiferromagnet is — J 
times a singlet projection operator. 
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On a bipartite lattice we can change the sign of (see footnote 4) , obtaining the 
operator Hij, equivalent to Hij, with 

+ j = 7i(ITi) + liT>) i(ail + UTI) 



- l(X - X Mi ) 




On the third line we have again written a worldline-like picture. The arrows (spin 
directions) remain the same on the fourth line, but we have now chosen to connect 
them differently, while keeping continuity of arrows along the connecting lines. As a 

result we sec that the contributions to Hij with nonzero matrix elements are the same 
as those contributing to a horizontal loop-breakup. Thus the horizontal breakup can 
be written as an operator 

H ^ ]^ = (ITi) + liT)) ((Til + UTI) , (3.4) 

with ^B^j a projection operator. After sublattice rotation on a bipartite lattice, the 
energy- shifted Heisenberg bond- operator is therefore the same as a horizontal breakup, 
on an operator level. 

In passing, we also note that the vertical breakup of the loop-algorithm is just the 
identity operator 

) ( ^ % ^ (3.5) 

The partition function eq. (2.3) of the antiferromagnet (J > 0) on a bipartite 
lattice becomes 

Z = tre-^^ = e'^^^W)^^^"^) . (3.6) 
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3.2 Isotropic Ferromagnet 

The ferromagnet can be treated in the same way. Now Jx = Jz = J < 0, and 
~\T\ V + 4 ~ SiSj + ^ 

= l(X + X + tt + ii ) (3.7) 

- K X + X + X + X ) 

Here the constant | was chosen to ehminate the contributions of antiparallel spins. 
On the third hne we have again written a worldhne-hke picture. The arrows (spin 
directions) remain the same on the fourth hne, but we have again chosen to connect 
them differently, while keeping continuity of arrows along the connecting lines. As 

a result we see that for the isotropic ferromagnet on any lattice, the operator Hij, 
after an energy-shift, is proportional to the operator Bf- for a diagonal loop- breakup, 
which indeed is the permutation operator. 

The partition function eq. (2.3) of the ferromagnet on any lattice becomes 

Z^tre-^^ = e^'-^^^.^^^'^-'^\ (3.8) 

Note that the difference between antiferromagnet and ferromagnet is connected to re- 
quiring positivity of the final exponent for the partition function, leading to horizontal 
breakups for the antiferromagnet and diagonal ones for the ferromagnet. 



3.3 Anisotropy 

To treat models with anisotropy A — Jz/\Jx\-i we can use the operator identities 



4 5f5| = X - ^ ' (3-9) 



which follows by subtracting eq. (3.3) from eq. (3.7), and 
For the antiferromagnet on a bipartite lattice we get 



2 [sfSJ + STSf) = X + X - 1 • (3-10) 



The anisotropic ferromagnet on any lattice is given by the same equation, with A < 0. 
These results are equivalent to eq. (2.39). They provide positive weights when | A| < 1. 
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Freezing and alternatives 

For A > 1, i.e. the Ising-like regions of parameter space, the operator formulation 
leads to, e.g., the following approaches: 

(1) One possibility is to use {S-Sj ± |) as an operator, with weight A — 1. Then 
on bonds where this operator acts, the weight of a configuration with AF (or 
FM) neighboring spins is zero, i.e. forbidden. This amounts to an operator 
formulation of freezing of the opposite spin orientation (see section 2.5). 

(2) Alternatively, one can use S^Sj as an operator, and proceed similarly to the 
case of merons (section 4.8). The flip of a loop connected to some other loop by 
an odd number of such operators will result in a sign flip for the total weight of 
the configuration, and therefore in a combined contribution of zero to the total 
weight. Thus only configurations with an even number of such connections 
contribute, while keeping loops independent. 

(3) An interesting different alternative has been developed by Otsuka [105]: He 
treats (A — l)SfSj by introducing Hubbard- Stratonovich variables, which act 
locally like a magnetic field. Then loop flips remain independent, with a flip 
probability that depends on the Hubbard- Stratonovich conflguration. 



3.4 Treating e as a continuous imaginary time path integral 

We can obtain the path integral in continuous imaginary time by directly employ- 
ing a Poisson process representation of e~^^ , as specified rigorously by Aizenman and 
Nachtergaele (section 2.2 of reference [10]). For related work see [89,90]. Rephrased 
for our context the statement refers to a Hamiltonian of the form 

H^-J2Jbk (3.12) 

b 

with bonds b, nonnegative couplings J(,, and selfadjoint operators hb, which acts on a 
finite system. Then e"^^ can be expressed as a Poisson integral: 

= e^^^'nimM^o (n6{(l - Jb^t) + Jbhb^t}y"'' 
= e^E.^^/p(rfcu) n* k , 

where H* hh is a time ordered product of bond operators hh-, is the bond configu- 
ration, and p{duj) is a poissonian probability measure with density Jli Jbdt. Thus, we 
get a random countable collection of time-indexed bonds which occur independently 
in disjoint regions of spacetime. This is a non-commutative version of the usual power 
series expansion of the exponential function. 
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Figure 9: Configuration of horizontal loop operators (left), and a compatible loop and (world) line 
configuration (right). 

Loop operator representation 

If we choose to represent the antiferromagnet as a sum over horizontal loop oper- 
ators, 



then the theorem states that we can represent as stochastic integral over con- 

figurations of loop operators B^j which appear with constant probability density J/2 
in continuous imaginary time. In between these operators there is just the identity 
operator, to which we have been referring as a vertical breakup. Quantizing in 5*^ ba- 
sis, one obtains continuous lines between the loop operators. Finally, taking the trace 
of e~^^ provides periodic boundary conditions in imaginary time for the resulting 
configuration of horizontal loop operators and lines. Since these lines do not start 
anywhere, they close to form loops, with two possible orientations of arrows (spins) 
on each loop. When the spin points upwards, these lines are worldlines. From eq. 
(3.3) we see that all matrix elements are unity, so the total weight of the configuration 
(once stochastically chosen) is unity. 

We have arrived at the combined loop-operator and spin representation of the 
Heisenberg model in continuous time. From this representation we can immediately 
derive the loop algorithm as a Quantum Monte Carlo procedure by switching back and 
forth between (i) choosing a new operator configuration compatible with the current 
spin configuration, with probability density J/2 for each operator, and (ii) choosing 
a new spin configuration compatible with the current operator configuration. This is 
the same procedure as that derived in section 2.13. 

When we sum over all possible spin configurations, two for each loop, we obtain 
the pure-loop representation of the Heisenberg model (see section 2.10). 



H + const 




(3.14) 
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Standard hopping representation 

We can also stay in the standard representation eq. (3.1), in which the operators 
S^S^ correspond to hopping of worldlines, and 4:S-Sj it 1 is diagonal, both with 
matrix elements unity or zero. 

Then we obtain a standard worldline representation, but directly in continuous 
time. Thus the existence of a continuous time representation is independent of 
whether one uses a loop representation [89]. Here one can perform, e.g., something 
like the usual local updates of standard worldline Monte Carlo, e.g. by proposing to 
shift an existing hopping operator S'^S" by some time At, uniformly chosen within 
a suitable range. 

3.5 Treating e~^^ by stochastic series expansion 

The stochastic series expansion of an operator e~^^ , was introduced by Anders 
Sandvik [6-8]. It is related to the Handscomb method [77-81] for the Heisenberg 
model, but also applicable to many other models. We assume again that H = J2u b Hu,b 
is a sum of operators Hj^^i, living on bonds h. (Site terms can also be brought into this 
form) . SSE proceeds from the power series 

e-^^ = , S {-HT 

= E^=i (-^1 -H,- ...)(-H, -H,- ...)... (3.15) 

where j is a shorthand for (u, b). The third line is obtained by expanding the product 
in the second line, resulting in a sum with one sub-Hamiltonian Hj^ from each of the 
n factors. For a fixed value of n, we obtain a sum over sequences of n operators Hj^. 
There is now a discrete index / = l..n with "time- like" properties, since it describes a 
sequence of Hamiltonians. At each value of I, only one of the sub-Hamiltonians acts. 
The partition function becomes 

z - EEE^ Hn(-^(^,^)JN (3-16) 

a n=0 Sn 

where Sn is a sequence ((z/, 6)i, (i/, 6)2, (z^, 6)n) of operator indices. We insert S^ 
eigenstates for [o;;) between all operators. The result is again a worldline-like rep- 
resentation of tr e~^^, this time with a discrete index space instead of continuous 
imaginary time. 

Let us emphasize that the stochastic series expansion, as presented so far, is a 
different treatment of e~^^, independent of the choice of sub-Hamiltonians and of 
any Monte Carlo procedure. We also note that the continuous time representation 
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eq. (3.13), with a time ordered set of sub-Hamiltonians, and the stochastic series 
expansion are indeed closely related. The biggest difference is the discreteness of 
index space in case of SSE. This discreteness increases the efficiency of SSE with 
respect to continuous time. 

Standard hopping representation 

If we choose the standard hopping representation eq. (3.1), then local updates of 
the operator configuration, as specified in the original SSE papers, are the natural 
choice. 

Loop operator representation 

If instead we choose a loop operator representation like eq. (3.14), then we arrive 
again at a combined worldline/loop-operator representation of tr e~^^ . Switching 
back and forth between choosing operators and choosing worldlines results in the 
loop algorithm, this time in discrete index space. For the isotropic Heisenberg anti- 
ferromagnet and ferromagnet, a variant of this procedure was introduced by Sandvik 
as "operator loops with deterministic updates" [13, 32], developed from a different ap- 
proach (section 5.2). Prom the present formulation we see that indeed the complete 
loop algorithm can be applied within SSE, including anisotropic cases and general- 
izations. 

3.6 Loop algorithm in SSE 

Let us describe a possible implementation of the loop algorithm with SSE in 
more detail. The continuous time version was described in section 2.13. For further 
information, we refer to the literature on SSE [6-8, 13, 48, 97]. 

We describe the case of an anisotropic Heisenberg model on a bipartite lattice. 
From eq. (3.11) we get (with J^. = 1 as unit of energy) 




(3.17) 



=: JhB^ + JdB'^ + eb 



with |A| < 1, e > and a "diagonal operator" D = \. We have added a constant 
to Hij to ensure positivity of matrix elements. The constant e might be varied to 
further improve convergence [48]. 
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Figure 10: Part of an SSE configuration in loop-operator representation. The diagonal operator 1 is 
drawn as a vertical breakup. A compatible spin-configuration is denoted by the arrows. The vertical 
axis marks the propagation index I, the horizontal axis numbers bonds b. The sequence {i^,b) is 
given on the right, with 1^=1 (j/ = 2) denoting a vertical (horizontal) breakup. 

It is advantageous to extend the SSE sequence to a fixed length L by introducing 
additional "empty" unit operators = 1. Then [8] 

^ = E E ^^^^^-f^ {a\ n(-i^M)J l«) (3.18) 

a Sl 1=1 

with n the number of non-unit operators in the sequence Sl- This can be done 
without approximation in practice, since the average length of the operator string is 
just the total energy of the system, as seen by differentiating with respect to P, and 
its fluctuations are flnite in any flnite run. A value for L with a safe margin can be 
determined during thermalization. 

The SSE conflguration is specifled by Sn, which can be written as a sequence of 
tuples {u, b), with (0, 0) denoting h^. When the operators Hi^u, b) are "non-branching", 
it suffices to specify the initial state a(0). Otherwise (as for horizontal loop-operators), 
the "worldline configuration" |a(/)) has to be stored, too. A picture of an SSE 
configuration is shown in figure 10. 

The simulation begins with, e.g., a configuration of all down spins and Sn consist- 
ing of only empty operators. 

An overall Monte-Carlo-update consists of three parts. First, empty operators h'^ 
and diagonal operators operators D are exchanged with Metropolis probability. 

M (0,0). -(1,6)0 = min (l, Mf^g)^) 

( (3.19) 

' Nbl3{a{l)\Db\ail)) J 




p((l,&)/-^ (0,0)0 = min 
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where Ni, is the total number of bonds and u — 1 stands for the diagonal operator 
D. This step changes the expansion order n by ±1. It is attempted consecutively for 
each I = 1, L. 

In the second step, the "breakup decision", diagonal operators D and loop- 
operators S'*, B'^ are exchanged. Now only the "vertices" with non-empty operators 
matter. As in the case of continuous time, one needs a doubly linked list connecting 
each of the four "legs" of each vertex with the corresponding leg of the next (resp. 
previous) vertex. The information on spins can be stored with the vertices. 

As usual at each vertex, the local spin configuration allows two of the three 
breakups, vertical (operator £>), horizontal B'^ or diagonal B'^. The choice is made 
at each vertex with heat-bath probability. Thus on an antiferromagnetic vertex (like 
plaquettes 2=*= in fig. 2), the horizontal breakup in our example is chosen with proba- 
bility 

^ + ^,, (3.20) 
4e + (1 + A) ^ ^ 

and on a vertex with spin-flip (like plaquettes 3^) it is chosen with probability 

TT^^^i^-ia + A). (3.21) 

4 



;i-A) + i(l + A) 2' 



The third step consists as ususal of identifying and flipping loops. Nontrivial 
diagonal operators as well as external flelds can influence the flip probability. Note 
that on sites on which no operators act there is a straight loop. 

The "deterministic operator-loops" previously constructed [13] for isotropic ferro- 
and antiferromagnet are somewhat different. They use a different diagonal operator 
in eq. (3.17) which forbids ferromagnetic vertex conflgurations. The generalization 
of the loop algorithm just outlined has lower autocorrelations, as recently shown by 
Syljuasen and Sandvik [48]. 

We have described the multi-cluster loop-algorithm. In the single duster version, 
loop construction starts at a randomly chosen vertex-leg, and decides breakups iter- 
atively until the loop closes. (This construction generahzes to the directed loops in 
section 5.3, which are better suited, e.g., for large magnetic flelds.) 

Measurements of static obscrvablcs are straightforward [104]. For example, the 
total energy is given hy E = ^ (n) and the heat capacity by 

Cv = (n') - {nf - (n) . (3.22) 

Time-dependent correlations are also available [8]. An imaginary time separation r 
corresponds to a binomial distribution of propagation distances Al, and correlators 
are given by 

(i(r)S(O)) = (± f;,) (l-^y ^'c^nm) (3.23) 



The Loop Algorithm 



46 



with 

CABm = -^jlB{l + AO A{1) . (3.24) 

Improved observables are available as usual. An efficient method to measure time- 
dependent Greens functions has been provided by Dorneich and Troyer [97]. 

3.7 Discussion 

We have seen that the issue of representing tr e~^^ by a (continuous or discrete) 
imaginary time path integral, or by SSE, is independent of the choice of representation 
of H, e.g. in terms of traditional hopping operators, or in terms of loop operators, 
and independent of the resulting possible update procedures. The loop operator 
representation of H and the full loop algorithm can be applied both in the imaginary 
time path integral and in SSE. Additional operators in H can be treated similarly (see 
sections 3.3,4.4). Because of its discrete index space, the stochastic series expansion 
appears to be somewhat more efficient computationally than continuous imaginary 
time. 



4 Generalizations 

So far we have purposely restricted ourselves to XYZ-likc models in order to sim- 
plify the presentation. This covered both the case of spin | quantum spin models, 
where we have inserted eigenstates \S^i) = |±1) (or eigenstates along a different quan- 
tization axis) and models of fermions or hard core bosons, where we have inserted 
occupation number eigenstates [5]. We have developed the formalism for the gen- 
eral anisotropic XYZ-like (eight-vertex-like) case. We have computed explicit update 
probabilities for all XXZ and most XYZ-like cases. 

Let us now describe further generalizations, several of which are immediate. For 
all generalizations, with a slight modification for continuous time (section 2.13), it 
remains true that locally on the vertices we have a situation like in the six- (or eight-) 
vertex model, so that the loop formulation, on the level of matrix elements or on the 
level of operators, described above can be applied directly. 

4.1 Long range couplings 

Hopping or spin-spin interactions beyond nearest neighbor can be handled by the 
same approach as higher dimensions, namely by introducing extra parts H,j in the 
split of the Hamiltonian, i.e. extra "bonds'' on the lattice, with a corresponding set 
of shaded plaquettes hving on separate Trotter time subshces. 
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When the number of additional couphngs is large, this approach becomes im- 
practical. It is somewhat less cumbersome in the continuous time version of the 
loop algorithm (section 2.13). A stochastic approach by Luijten and Blote to extend 
cluster simulations to such cases may be helpful here [106]. 

4.2 Bond disorder, diluted lattices, and frustration 

Bond disorder refers to spatial variations in the spin couplings Jij (resp. hop- 
ping strengths tij and/or density-density couplings Vij). This modifies the loop- 
construction probabilities locally, making them plaquette- dependent. Otherwise noth- 
ing changes ! (One does need to check whether ergodicity is still achieved.) 

The same is true for diluted lattices, which can be viewed of bond 

disorder in which the coupling vanishes completely on some bonds. 

The situation is different for frustrated couplings whence some matrix elements 
become negative and cannot be transformed to a positive representation. Whereas 
the loop algorithm itself is unaffected, this produces a second type of sign problem 
which needs to be handled in the same way as the fermion sign problem (section 4.7). 
If the strength and/or frequency of frustrated matrix elements, as well as system 
size and (3 are not too large, this sign problem can remain manageable [42, 107]. In 
general, however, it has so far precluded simulations of strongly frustrated models. It 
is possible to find improved estimators for frustrated couplings [42], to alleviate this 
sign problem. Note that for non-bipartite lattices, the breakup probabilities have to 
be modified to ensure ergodicity of the algorithm, as has been discussed in section 2.9. 
This modification might introduce additional freezing, which, however, is dwarfed by 
the sign problem. 

Promisingly, the mcron-strategy outlined in section 4.8 can also be applicable 
to frustrated systems. Indeed, this way Henelius and Sandvik have succeeded in 
simulating a "semi-frustrated" Heisenberg quantum spin system without sign problem 
[32] , though at present other frustrated model. In special cases, one can also find a new 
basis without sign problem, e.g. for a Trellis lattice with special range of couplings 
[108,109]. 

4.3 Asymmetric Hamiltonians: Magnetic field, chemical potential 

Asymmetric weights can be caused by a (uniform or random) magnetic field in 
S'^-direction or equivalently a chemical potential, or by other non "particle-hole- 
symmetric" terms, like e.g. softcore bosons. Large diagonal fields seriously affect 
the performance of the loop algorithm. Transverse fields, however, are easily handled 
(see section 4.4). 

Within the algorithm presented so far, asymmetric parts of the Hamiltonian need 
to be taken into the global weight Agiobai, such that W^''"''^ is symmetric with respect 
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to a flip of all four spins on each shaded plaquette. Then they contribute to the flip 
probabilities pfup of the clusters. 

A magnetic field h (or chemical potential fi) affects only clusters which change 
the number of worldlines, i.e. which wind around the lattice in temporal direction. 
The acceptance rate for the flip of such clusters is proportional to exp(— /^/in^t,), 
where is the temporal winding number of the cluster. At sufficiently large val- 
ues of (3h the acceptance rate becomes very small on average and results in an 
exponential slowing down of the simulation. This was indeed observed [97, 110, 111]. 
For such large diagonal fields, the "worm-algorithm", and especially the general 
method of "directed loops", discussed in section 5, are much better suited. When 
comparing computational effort for equal error bars for one- and two-dimensional 
Heisenberg systems, the threshold in comparison to SSE worms (see section 5.2) 
was f3h^3 [97]. When comparing to the worm- algorithm for a Heisenberg-chain, the 
threshold was at ph>5 [111]. 

On the other hand, at small fields (5h^l it appears to be advantageous to use the 
loop algorithm with global acceptance step [97] to achieve equal errors with smaller 
computational effort. 

To minimize the acceptance problem, one should normally choose Agiobai such that 
its fiuctuations are minimized. Somewhat surprisingly, it has also been reported [112] 
that supplementing the cluster updates with explicit global flips of worldlines helps 
considerably, even at rather larger values of fih. An interesting variant of the loop 
algorithm which appears to work well for very large diagonal flelds was introduced 
by Syljuasen [113]. 

Let us note that the canonical ensemble (or ensemble of constant magnetization) 
is simulated by disallowing the flip of clusters which would change the number of 
worldlines. It is also possible, and can reduce autocorrelations, to allow the number 
of worldlines to fluctuate, and afterwards treat each subset of conflgurations with a 
certain number of worldlines as a canonical simulation [64,65]. 

4.4 Transverse field 

There is a different way to treat magnetic fields and chemical potentials, which 
can be simulated without difficulties. One can change the axis of quantization to turn 

a magnetic field into the x-direction. The field operator = | ( ) then fiips the 

direction of a spin. It can act at any spacetinie point (i/). Worldlines now become 
piecewise continuous, with spin flips whereever acts. For the isotropic Heisenberg 
model, the physics is invariant under this rotation. 

Rieger and Kawashima [17] have shown how to treat such source operators in a 
cluster algorithm^^. They investigated the Ising model in a transverse fleld, for which 

^^Note that with sources, local updates allow the change of global quantities and become ergodic 
too. See also section 5. 
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Jx — and all worldlines are thus straight. Here we apply their method to the loop 
algorithm. 

We enlarge our graph to consist not only of breakups but also of the information, 
for each spacetime site, whether acts there or not. occurs in matrix elements 
hke Wfi = (+1 exp (ArhS^)]—) = sinh {^h). Its presence or absence in a graph can 
be treated with Metropolis probability^^: Tentatively put additional operators 
with probability^^ p — Wh on any lattice site where there is no yet. After choosing 
breakups as usual we now have segments of loops between successive occurences of 
sources S^. Flip each scgment^^ with probability | (modified with any remaining 
external weight Agiobai )- Finally, remove the sources between segments of equal orien- 
tation. This completes one update. A generic framework for such tentative updates 
has been provided in ref. [114]. 

Now the magnetic field (chemical potential) can be of arbitrary size, without 
acceptance problems (section 4.3). We can measure off-diagonal Greens functions 
in directly in the worldline representation, as well as in the loop representation 
(section 4.6). When the original model is not isotropic, one can still rotate the 
magnetic field into the x direction, obtaining an anisotropic XYZ model (sec end of 
section 2.1). The loop algorithm for this model (section 2.7) involves breakups for 
which the arrow direction changes on both loop segment, which is equivalent to the 
original breakups plus two sources. 

The procedure just described works in the ferromagnetic case. For an antiferro- 
magnet, we have to perform a rotation S^'"^ —S^^y on one sublattice to achieve 
positive matrix elements in eq. (2.6). This minus sign stays with the source opera- 
tors S^, multiplies the configurational weight W{Sy^ and results in a sign problem. 
Chandrasekharan, Scarlet, and Wiese [18, 19] showed that this sign problem can be 
removed completely with the meron approach described in section 4.8. To include 
sources, they use a method similar to that of Rieger and Kawashima. They introduce 
additional time slices and employ the Swendsen-Wang algorithm [15, 60] to set or 
unset bond-variables (as part of loops) which force equal spins along the additional 
timelike lattice bonds. This subdivides the loop-clusters and is equivalent to the 
Metropolis procedure described above. They show that for large fields, the algorithm 
with transverse fields performs orders of magnitude better than the original one with 
diagonal fields. 

The strategy described here, namely to include operators stochastically, resem- 
bles their occurence in the stochastic series expansion (section 3.5). Indeed, one can 
also separate other operators from the Hamiltonian and include them stochastically 
in the same manner as S^. (See section 3.3). 

^^I.e. propose creation of new operators with p = Wh, and deletion of existing ones with p= 1. 

^"^In continuous time [17] this become constant probabiHty Wh/^r per time. 
"'^^When there is freezing, we similarly have to flip subgraphs of clusters. 
^^I thank Matthias Troyer for this observation. 
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4.5 Higher Spin representations 

The loop algorithm for spin models has so far been formulated for the spin-^ 
case. One way to extend it to higher spin representations would be [1] to use the 
corresponding vertex representation (19- vertex model for spin-1) and to try the same 
formalism as for spin-|. 

Kawashima and Gubernatis have successfully employed a different approach [9, 
33]. They write higher spin representations as a product of spin-i representations, 
with a projection operator onto the proper total spin. They arrive at new "shaded 
plaquettes", between the different spin-| representations at each space-time site. Lo- 
cally on each plaquette, the situation looks again like a six- or eight- vertex model. 
By the same approach, Kawashima also treated the anisotropic XYZ case for general 
spins [34]. (Here the number of different graphs quickly proliferates). This general- 
ization was successfully tested on an antiferromagnetic Heisenberg chain with S = 1, 
finding complete removal of autocorrelations [33]. 

Since the Hamiltonian commutes with the projection operator, the procedure 
can be simplified [35-37] by projecting onto total spin S only at time zero. For 
general spin S [36,37] there are again 25* spin-variables per site. Breakups with 
the usual probabilites (section 2.7) are allowed between any pair of spin variables 
from neighboring sites. Projection is achieved at time zero by symmetrizing over 
permutations of the worldline variables at each site, subject to worldline continuity. 
This procedure immediately generalizes to continuous time (section 2.13) [35,36]. 

Harada and Kawashima [115] recently improved this approach in the framework 
of directed loops by adjusting the algorithm to reflect a stochastic mapping from the 
space of 2S spin-| variables to a single variable e {1 ... 25" -|- 1} (see section 5.3). 

4.6 Off-diagonal operators 

Brower, Chandrasekharan, and Wiese showed [12] that one can obtain off-diagonal 
greens functions, especially two-point functions, by measuring properties of the loop 
clusters, i.e. improved estimators, instead of properties of the worldline configurations. 
We first discuss the off-diagonal two-point function (S^Sj) between spacetime sites 
It corresponds to a configuration with an additional propagator from i to j, i.e. 
with a partial worldline. Such a configuration never occurs during the simulation. 
Thus, in a regular worldline Monte-Carlo with continuous worldlines, off-diagonal 
Greens functions cannot be measured at alP°. Instead, one would have to perform 
a separate simulation with fixed sources for each pair of sites and somehow 

measure and adjust the normalization .^sources ij/'^no sources- This is not feasible with 
the usual worldline method. Note, however, that off-diagonal two-point functions can 
be measured directly in the extended ensemble of worm methods (section 5), both 

^•^One can measure equal-time correlation functions by introducing open b.c. on one timeslice 
[116]. 
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with imaginary time and with SSE. They are also directly accessible when there are 
transverse fields (section 4.4). 

In the loop algorithm, (S^S^) is easy to measure even without transverse fields. 
Let us look at a given configuration of loops. If spacetime sites i and j are connected 
by a loop, we could flip the partial loop from i to j (or from j to i, depending on 
the current spins at to obtain the desired propagator, and thus a contribution to 
{S^S~). Note that we do not actually need to perform such a partial loop flip as a 
Monte Carlo update, we can just virtually do it to perform the measurement as an 
improved estimator (section 2.14). When there is no external asymmetry Agioi,ah the 
proper flip probability for use in the improved estimator is | for the partial loop, and 
the improved estimator for S^S~ is 



,/A+c-\ _ j ^ ' ^ij^ if the sites iJ are on the same loop, . 
)impr - < Q otherwise. ^ ^ 



(For the antiferromagnet a phase factor (p^j — (^—iy^i-^i\ appears due to the sublattice 
rotation (footnote 4); (pij = 1 otherwise.) This resembles the improved estimator eq. 
(2.52) for the diagonal correlation function. There the criterion for a finite contri- 
bution is that the two sites are located on the same cluster, whereas here it is the 
same loop. For uniform correlations of the isotropic Heisenberg ferromagnet and for 
staggered correlations of the antiferromagnet, the improved estimators eq. (2.52) and 
eq. (4.1) are identical, directly reflecting spin rotation invariance. Note that by us- 
ing the improved estimator, one can simultaneously measure S^Sj' in a given cluster 
conflguration for all pairs of sites When there is an external weight Agiobai, the 
improved estimator eq. (4.1) has to be modified, as described in section 2.14. 

Brower, Chandrasekharan, and Wiese [12] show that improved estimators exist 
for general Greens functions, by developing an operator description of loop clusters. 
In their formulation, each specific plaquette breakup is written as a local transfer- 
matrix operator T that evolves worldline spins across the plaquette in time direction. 
The operator is specified by its matrix elements, which are linear combinations of 
Kroneckcr-deltas for the spin values, reflecting the specific breakup. For an explicit 
representation in terms of spin operators, see section 3. The important point here 
is the existence of such an operator. A given graph G then corresponds to an oper- 
ator Mg, with trSpinsMc = 2^""'"''^'' for the fieldless case. The partition 
function is Z — J2g '^(^) trMo (see eq. (2.43)), where w{G) is the product over pla- 
quettes of the breakup-weights w^^ occuring in G. In this operator language in graph 
space, we can now also write expectation values: 

(O) = ^T.^{G)tr (OMg) = {^-^^^)MonteCarlo, (4.2) 

where the trace is over spin configurations, and the last average is taken over the clus- 
ter configurations generated in a Monte Carlo simulation. For the two-point Greens 
function, O consists of two source terms, and we get the improved estimator eq. (4.1). 
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Similar results obtain for general n-point Greens functions, where now contributions 
can come from more than one cluster. A four-point function {S^S~ S^Sf), for ex- 
ample, gets contributions from sites i,j, k, I which are (1) all on the same loop, or (2) 
on two different loops, with pairs of sites (j, k) or (i, /), (j, k) being on the same 
loop. For some further details we refer to refs. [12, 117]. We also note, that ostensibly 
four-point properties can sometimes be reduced to two-point functions, as shown for 
the Drude-weight in [117, 118]. 



4.7 Fermionic Models 



Fermionic models can be treated in the same way as hard core bosons [5], with 
the addition of a fermionic sign for each permutation of worldlincs [116]. The Monte 
Carlo simulation is performed with the modulus of the configuration weight, 

and observables are determined as 

{0^_ (4.3) 

where sign is the sign of W{S). Since^^ {sign) ~ exp {—const ■ j3 ■ Volume)^ simulations 
have so far been restricted mostly to one-dimensional unfrustratcd models and to 
small or very low-doping higher-dimensional or frustrated systems. The identical sign 
problem also occurs for fermion simulations with the loop algorithm (first performed 
in ref. [119])- By clever use of improved estimators, a solution of the sign problem 
was found for a restricted class of models (see section 4.8). This class does, however, 
not so far include the standard Hubbard model nor the t-J model. For these models, 
several generalizations of the loop algorithm have been developed: 

The Hubbard Model can be viewed as consisting of two systems of tight binding 
fermions, each mapping to an XXZ- model (plus fermion sign), coupled by the Hub- 

T I 

bard interaction U J2inlnl . It can therefore immediately be simulated by employing 
a loop algorithm for each of the XXZ models, and taking the Hubbard interaction as 
well as the chemical potential /il]j(nj +n\) into the global weight Agiobai, eq. (2.18). 
However, for large values of \U\ this procedure is not very efficient, since the global 
weight will fluctuate too strongly, resulting in small acceptance rates, especially for 
the flips of large loops. 

Kawashima, Gubernatis, and Evertz [40] therefore added an additional new type of 
loop-update, called loop-exchange, which flips between spin-up and spin-down, leav- 
ing unoccupied sites in the worldline lattice unchanged. These loops move upwards in 
time on spin-up sites, and downwards on spin-down sites. The breakup probabilites 

^^The exponential form results since negative contributions to {sign) originate in finite spacetime 
regions more or less independently, from negative matrix elements and/or winding of worldlines. 
More formally [20], {sign) = exp{—pVAf) corresponds physically to the difference A/ in free 
energy between the fermionic model and the "bosonic" model \W\, whose nature depends on the 
quantization and weight function W that was chosen. 
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Figure 11: Comparison of the loop algorithm and local updates for the ID Hubbard model (32 
sites), adapted from ref. [40]. The figure shows the integrated autocorrelation time t^^^ (which 
is proportional to the computer time required for a given accuracy) for the total energy E, on a 
logarithmic scale. The autocorrelation times for other quantities behave similarly. The upper three 
curves are for local updates, the others for loop updates including loop exchange. In both cases, 
[/ = 2, 4, and 8, with autocorrelation times increasing with growing \J . With local updates the 
autocorrelation times are very large already at small P, and grow rapidly when /3 or C/ are increased 
(roughly consistent with r ~ f3^U^ with z<2 and x>3). For U = 8 these simulations did not 
converge, and only lower bounds for r arc shown. At /3>,1.5 none of the simulations with local 
updates converged. With loop updates the situation improves drastically. All autocorrelation times 
are less than about 2. Thus at large /3 and U several (likely many) orders of magnitude in computer 
time are saved. 

can be constructed with the formahsm of section 2. The loops were chosen to change 
direction (i.e. use a horizontal breakup) when spin-up and spin-down worldlines meet. 
Flips of these loops are not affected by the Hubbard interaction nor by the chemical 
potential, and can therefore always be accepted. 

In the ID Hubbard model this additional type of loop updates eliminated all 
remaining autocorrelations in the Hubbard simulations. An example is shown in 
figure 11. For the loop updates, the autocorrelation times remain smaller than 2 
at all temperatures. No slowing down is visible at all. For local updates, on the 
other hand, the autocorrelation times in figure 11 show the expected rapid increase 
(see appendix B), consistent with t ~ /?^. They are orders of magnitude larger 
than the loop-autocorrelation times, already at small f3 and even for the energy as 
an observable, which as a locally defined quantity is expected to converge relatively 
fast in a local algorithm. Beyond (3^1.5, the local Monte Carlo did not converge 
anymore. The autocorrelation times are expected to continue to grow. Note that the 
autocorrelation times for the local algorithm will additionally grow like l/(Ar)^ for 
improved Trotter discretization, whereas the loop algorithm does not suffer from this 
effect, and can moreover be implemented directly in continuous time. 
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A similar loop- update method was independently developed and recently employed 
by Sengupta, Sandvik, and Campbell [45] within the SSE formalism for a Hubbard 
model including nearest neighbor Coulomb repulsion. The authors also show how to 
apply the parallel tempering technique to such simulations. 

For the t-J model a number of generalizations of the loop algorithm have been 
developed [41-44]. Here we have three kinds of site-occupation: spin-up, spin-down, 
and empty. The model can be simulated by a divide and conquer strategy [41,42], 
using three different types of loop updates. In the first update, the empty sites 
in the worldline configuration are left untouched. The remaining sites with spin- 
up or spin-down can be updated with a loop algorithm very similar to that for the 
spin-| Heisenberg antiferromagnet. In the second update, sites with spin-up are left 
untouched, and updates between spin-down and empty sites are done with a loop 
algorithm (which now looks similar to that for hard core particles). In the third 
update, loops on spin-up and empty sites are constructed. All three kinds of loops 
fall within the XXZ case discussed in section 2.7. The combined algorithm is indeed 
a working cluster algorithm for the t-J model, ergodic, removing autocorrelations, 
and providing improved estimators which further reduce statistical errors. It was 
successfully used in ref. [42] on single t-J chains and on two and three coupled t-J 
chains. In ref. [120] it was used to study thermodynamic and diamagnetic properties 
of the square lattice t-J model with up to 2 holes, including cases with spin-anisotropy. 

Brunner and Muramatsu [43] use another representation of the t-J model, in 
which only holes are fermionic. They appear in bilinear form and can only live on 
spin-down sites. In this representation, two kinds of loop updates therefore suffice. 
The authors study the stability of the Nagaoka state on two-dimensional lattices of 
size up to 10 X 10 at small J, with up to two holes, reaching temperatures down to 
Pt — 2500 for a single hole and pt — 150 for 2 holes at J = 0, much lower than 
previously attainable with other methods. 

Brunner, Assaad, and Muramatsu developed a different very interesting method 
[44]. In the same representation, they simulate just the spin degrees of freedom (pure 
Heisenberg model), and can then calculate dynamical properties of a single hole in 
imaginary time by exactly integrating out the bilinear fermion degrees of freedom. 
The method has been applied to several 1- and 2-dimensional cases [44, 121, 122]. 

For relativistic gauge theories, Wiese et al. [123] have developed the so-called 
quantum-link formalism. They employ an additional artificial "time" direction, for 
which they can apply variants of the loop algorithm, e.g. for U (1) gauge theory, aiming 
at efficient simulations of fermionic lattice QCD. 

In a recent different approach to the sign problem [124], Lee observed that the 
typical distance which fermions "walk" is only of order \/Bt.. In this approach fermion 
permutations are restricted to finite zones, and results extrapolated in the size of these 
zones. 
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4.8 Fermion simulations without sign problem: The meron method 

Chandrasekharan and Wiese [20] discovered an exciting possibility to overcome 
the fermion sign problem for a restricted class of models, by clever use of improved 
estimators. They treat the fermionic version of the spin | antiferromagnetic Heisen- 
berg model, i.e. spinless fermions with nearest neighbor repulsion V — 2t, with 
Hamiltonian 

H = -tT.i4cj + 4c,) + Vj2{h, - ^){h, - ^), (4.4) 

{«i> (*i> 

and the usual relations J^^y = 2t, — V, rii — j — Sf, and = 0, 1 between particle 
language and spin language. 

Then there are only vertical and horizontal breakups (section 2.8), no diagonal 
ones. In this case one can write down an improved estimator for the fermion sign 
in which the contribution from each cluster is independent. Namely, the fermion 
sign of the worldline configuration changes by a factor (_i)i+"ii'+";i/2 ^hen flipping a 
cluster, where is the temporal winding number of the cluster, and Uh is the number 
of horizontal moves in the cluster. They denote by "meron" a cluster for which 
this factor is ^1. Without chemical potential, each cluster can flip independently 
with propability |. Each meron then contributes a factor ((+1) + (— 1))/2 = to 
the improved estimator for the sign, which is therefore zero whenever there is at 
least one meron. When there is no meron, the improved estimator is +1, since for 
the Heisenberg antiferromagnet any allowed loop configuration can be constructed 
from the reference staggered worldline configuration (see section 2.14), which has 
sign = +1, and since without merons, all configurations contributing to the improved 
estimator have the same sign. Thus, 

u, ll'meron 

has no negative contributions. In a standard simulation, it is however dominated by 
contributions 0, and still has an exponentially bad variance. 

Suppose we want to measure staggered two-point functions like O — {—Vf~\ni — 
|)(nj — \). To use eq. (4.3), we determine the improved estimator eq. (2.52) for 
[O ■ sign). We can evaluate it by using the reference configuration. It is equal to ^ in 
two cases: (i) when there are no merons and i and j are on the same loop; (ii)when 
there are two merons and i is located on one, j on the other meron; otherwise it is zero. 
For the susceptibility X = ^ (O^) of the staggered occupation O — Z^i(— l)'(rij — |) 
we use eq. (2.58) for the improved estimator and get 

S (clusters c) 1*^1 ~^ |Cl| |C2|) 

" {{sign)^^^,) - 4M2Vol {Sn„,o) 

(4.6) 
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which has contributions only from the zero- and two-meron sector. Here the meron- 
contribution obtains a factor 2 since sites i and j can be on different merons two 
ways. 

Chandrasekharan and Wiese point out that one can therefore restrict simulations 
to rimer on < 2, thereby drastically reducing the visited phase space. This is achieved 
by starting with the reference worldline configuration and with completely vertical 
breakups. The breakups are then updated with the usual probabilities plaquette by 
plaquette, subject to the additional constraint that the total number of merons in a 
cluster configuration can at most be 2. Each plaqutte update requires that one follows 
the partial loops attached to that plaquette to determine their meron nature. Since 
the typical loop size increases with growing spatial and temporal correlation length, 
this step increases the computational cost, but at most by a factor proportional to 
space-time volume^^.The resulting loop clusters are flipped (e.g.) after all plaquettes 
have been updated, to get to a new worldline configuration (the sign of which is 
irrelevant) . 

The denominator in eq. (4.6) can be viewed as the remaining effect of the fermionic 
sign. This denominator would up to now still be very small. This can be avoided 
by reweighting the simulation so that roughly half of the generated cluster configu- 
rations have zero merons. The corresponding reweighting factor has to be cancelled 
when computing expectation values [20] . All n-point functions can be computed sim- 
ilarly by including 0{n) merons. Overall, the total computational effort per sweep is 
proportional to between Vj and V^^, where Vt is the spacetime volume, with strictly 
positive estimators and no exponential sign problem remaining. The sign problem for 
this model is thus solved ! 

For the meron approach to work we need a multi-cluster method with two stringent 
conditions. Loop-flips must be independent in their effect on the worldline-sign. For 
the tl^-model this means that there cannot be any diagonal breakups, so we need 
V > 2t; and there must be a reference worldline conflguration in the sense of section 
2.14, with positive worldline sign. We also need to ensure ergodicity. 

These restrictions unfortunately exclude most interesting models at present, hke 
the standard Hubbard and t-J models. Nevertheless, valuable progress has been 
made. Chandrasekharan [22] developed a meron method for free spinless fermions at 
a large chemical potential. In = 2, this implies a fllling of 2% at /3 = 3, becoming less 
with larger /3. This method uses only free fermions. Chandrasekharan and Osborn 
[27-29] succeeded in a meron method for a non-standard type of Hubbard model, 
with unusual correlated hopping, constructed by requiring that the clusters for up- 
and down-spins should be identical; then the reference conflguration automatically 
has a positive sign. They studied an attractive Hubbard model of this class on lattices 
as large as 128 x 128 and flnd critical behaviour that is well described by a Kosterlitz- 
Thouless transition. The method is also available for repulsive models with hmited 
chemical potential. For a review, see ref. [31]. 

^^This cost has been reduced to logarithmic by a binary tree search [27] 
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The meron approach can also be apphed to other sign problems. Chandrasekha- 
ran, Scarlet, and Wicsc [19] used it successfully for antiferromagnetic Heisenberg 4-leg 
ladders in a transverse uniform magnetic field (see section 4.4). The same technique 
was used for the tV model in a staggered field [23,24]. Henelius and Sandvik suc- 
ceeded in applying the meron technique to a "semi-frustrated" Heisenberg quantum 
spin system. They also discuss details of the implementation and optimal reweighting. 

5 Related Methods 

The main limitation of the loop algorithm is its exponential slowing down for di- 
agonal magnetic fields (or chemical potentials) (3h^3. This problem can be overcome 
by "worm-like" methods, which extend phase space by source operators and perform 
/oca/ updates which eventually combine to a permissible update in the original ensem- 
ble. We first describe the original worm algorithm, then the so-called 'operator-loop 
updates' in SSE, for which one limiting case are "deterministic operator loops" . Re- 
cently, the very capable worm-like method of "directed loops" has been constructed. 
In limiting cases it becomes similar to single-loop construction in the loop algorithm. 
It is briefly described in section 5.3. 

5.1 Worm Algorithm 

In the standard worldlinc formulation with local updates it is almost impossible to 
compute single particle Greens functions like e.g. {a{xo, to) a'^{xi,ti)) (where a"!", a are 
creation and annihilation operators). In order to do so, one would have to introduce 
sources at (a;o,to) and {xi,ti) explicitely, with a partial worldlinc between these two 
points, and to perform a separate simulation for each such pair of coordinates (see 
also section 4.6). 

A very elegant solution to this problem was provided by Prokof'ev, Svistunov, 
and Tupitsyn [47]. Their method can be viewed from the perspective of single loop 
construction. During that construction, there is a partial loop with two open ends. 
Flipping this partial loop would result in a partial worldlinc, i.e. a propagator between 
two sources, just as desired. Thus every step in a single-loop construction can be taken 
to provide a configuration for the measurement of Greens functions (see also section 
4.6). 

Prokof'ev et al. turn this observation around and explicitely construct a single 
propagator with two ends ("Worm") in continuous time, related to the way a single 
loop would be constructed. The Monte Carlo moves are thus local in space and in 
time (within a constant neighborhood, see section 2.13). Each local step provides a 
new configuration to the measurement of Greens functions. When the sources meet 
and annihilate (equivalent to the closing of a single loop), contact is made to the 
sourceless partition function, thus providing the correct normalization. Prokof'ev et 
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al. supplement these moves by additional moves corresponding to the flip of small 

closed loops, in order to make the simulation faster. The Worm-algorithm is available 
for any spin-magnitude. It is ergodic in the same way that the loop algorithm is. It 
is expected to have a small dynamical critical exponent [111]; On a one- dimensional 
chain, z was measured to be close to zero [111]. 

A very important advantage of the local updates is that all interactions in the 
Hamiltonian, like e.g. magnetic fields, can be taken into account in each step, without 
encountering prohibitively small acceptance rates. This is in contrast to the loop 
algorithm itself, which has to put unsuitable interactions into the global weight Agi^hai 
(see section 4.3). 

Prokof 'ev et al. have apphed the method very successfully to the ID Bose Hubbard 
model [47, 125, 126] with soft core bosons, also with disorder, to the 2d t-J model with 
a single hole [127], to the 3d Hubbard model in an optical trap [128], as well as to 
other problems including [129-133]. 

The worm-method has also been adapted to classical spin models [134]. 

5.2 SSE with "operator loop updates" 

Sandvik developed a worm-hke method called "operator loop updates" [13] within 
the stochastic series expansion. It can be seen as a special case of the new directed 
loops discussed below, for which the new standard solution is still more efficient. It 
has been used to study a variety of models with fields and other asymmetries, like 
two-dimensional hardcore bosons with chemical potential and/or next-near neighbor 
repulsion [135-138] and numerous others [32,45, 139-147]. 

In the special case oi h — and A = ±1, operator loop updates become "de- 
terministic": Once a set of vertices has been constructed in the first update step of 
the SSE procedure (see section 3.6), each vertex then becomes a horizontal (resp. 
diagonal) breakup with probability 1, so that the loop-configuration is already fixed. 
This method is similar to (but not quite the same as) the loop algorithm reviewed 
here. It has been used in a sizeable number of studies mentioned in section 6. 

5.3 Directed Loops 

Syljuasen and Sandvik recently developed the worm-like method of directed loops, 
both in SSE and in continuous time [48], for very general Hamiltonians. Like in the 
original worm method, a single "loop" is constructed by propagating a source-term 
( "loop- head" ) until it meets the second source. Updates are performed locally, and 
spins are flipped immediately. The source term can be a creation or an annihilation 
operator. 

At each vertex (in the language of SSE) which the loop-head enters through some 
entry-leg, there are a priori five possibilities: (i) The loop-head can exit through one 
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of the other three legs. One of these is usually forbidden, unless the Hamiltonian 

contains pair-creation/annihilation operators; (ii) it can "bounce", i.e. exit through 
the same leg through which it entered, thus undoing the last spin flip and starting 
to backtrack, or (iii) it can just stop, in case the Hamiltonian contains source terms 
[97]. 

Detailed balance is constructed in the enlarged phase space of local spin config- 
urations and local directed loop paths, similar to the extended phase space of the 
loop algorithm in the Kandel-Domany formalism, section 2.3. Weights W{s,li,l2) 
are assigned to vertices with spin configuration s, entry leg /i and exit leg I2, such 
that 

W{s) = J2W{s,h,l2) (5.1) 

h 

where W{s) is the usual weight of the spin configuration s. Detailed balance for an 
overall spin-update, after the loop closes, requires 

p(s s') W(s) = p(s' s) W(s') . (5.2) 

This can be ensured by demanding that the local weights are symmetric under reversal 
of the loop-path 

W{s,h,l2) = W{s',l2,li) . (5.3) 

Other constraints are symmetries of the weights with respect to spatial and temporal 
reflections. A very desirable criterion is minimization of the bounce-probabilities. 

For the XXZ model in a held, Syljuasen and Sandvik provide such a solution. (See 
also [115].) Within a whole region |A| + ^ < 1 including flnite magnetic fleld, they 
achieve zero bounce probability. For the XY-model (A = 0) this includes all flelds up 
to the saturation fleld. Continuous imaginary time for directed loops is implemented 
in a similar way as in the loop algorithm. 

Directed loops work very well. Syljuasen and Sandvik show that this method 
works better than the previously used worm-like "operator loop update" [13] (not 
identical to the loop algorithm reviewed here) which has larger bounce probabilities. 
It shows very small autocorrelation times in zero fleld and in any magnetic fleld for 
one- and two-dimensional Heisenberg models. The more complex simulations using 
continuous imaginary time show lower autocorrelation times than those using SSE, 
which contain a parameter e that needs to be adjusted. Measured critical exponents 
for integrated autocorrelation times are small: z = for a bilayer Heisenberg model, 
0.25 for the 3D model, and about 0.75 for a ID chain. 

In ref. [48] it is pointed out that directed loops differ from the worm algorithm of 
section 5.1, which show much larger autocorrelations in a magnetic fleld [111]. 

It is interesting to compare worms and loops in cases with similar move-probabilities 
at vertices. At /i = 0, |A| < 1, the directed loop method has the same probabili- 
ties for traversing a vertex as the loop algorithm described in this review. Yet the 
methods are not the same: When the loop-head reaches a vertex a second time, the 
exit decision in a worm-method is in general independent of the previous route at 
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this vertex, unlike the breakup-decision of the loop algorithm. Therefore the worm 
can self-overlap (even if it does not boTincc), and a set of worms does not subdivide 
the lattice into clusters. (The "deterministic operator loops" [13] although originally 
devised as worms, are an exception.) One consequence is that improved estimators 
are not available, except those obtained from the presence of sources. Therefore 
methods like the meron-method or the infinite-lattice method, which intrinsically use 
improved estimators, (i.e. the loop-representation of a model), are not available in a 
worm-approach. 

An adaptation of the directed loop method to Spin 1 was provided by Bergkvist et 
al [148] and apphed to the random bond Heisenberg chain. Within the framework of 
directed loops, Harada and Kawashima recently developed a method to simulate gen- 
eral spin S systems [115]. They stochastically map the usual system of symmetrized 
25* spin-| variables to "coarsened" variables with only 25*+ 1 values. For these coars- 
ened variables, the directed loop-probabilities were worked out directly, so that the 
original 25" variables are not needed, and the simulation can be performed in a single 
variable. 



6 Some Applications 

We briefly point to some of the apphcations of the loop algorithm to show in 
which ways it has been used in practice. For worm-like-methods see section 5. Other 
applications have been mentioned in previous sections. Most recent calculations have 
been performed with the continuous time loop algorithm; many also with the slightly 
different "deterministic" SSE operator loops [32,45,139-147] (see section 5.2). 

Spin \ isotropic Heisenberg antiferromagnets have been investigated in many stud- 
ies. For variations of this model, the loop algorithm has been particularly valuable. 
It has for example allowed high precision calculations of the critical exponents of a 
quantum critical point [49-51, 149] in a 2D depleted system. System sizes up to one 
million sites (at PJ = 5.5) [150] and temperatures down to /3J = 1000 (for 2500 sites) 
[151] have been accessible. No sign of critical slowing down has been reported in the 
calculations for this model without magnetic fields. 

The even/odd structure [95] and correlation lengths of spin ladders [19, 92, 99, 152- 
155] and coupled ladders [156-158], including quantum phase transitions in three 
dimensionally coupled chains and ladders [141, 159] have been investigated in detail, 
with a very extensive recent study in ref. [160]. Very precise studies of the finite 
size scaling of the 2D system [150, 161-163] have been able to extract the asymptotic 
infinite lattice low temperature behavior, with correlation lengths up to 350,000, and 
to test the predictions from chiral perturbation theory. Similar studies have been 
performed for layered 2D systems [147, 164, 165] . Universality of the KT transition 
temperature in a bilayer system with small magnetic field has been shown [166]. 

Dimerized systems have been studied extensively [167-171]. Random systems 
have become accessible at sufficiently low temperatures and large system sizes, in- 
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eluding the necessary averaging over disorder realizations. Bond disorder in chains 
[36, 65, 172-175], bond dilution in 2d systems [144, 176, 177] and coupled layers [145], 
as well as random nonmagnetic impurities in ladders [178-180], coupled ladders 
[181], and two-dimensional systems [142,143,146,151,182-184] have been investi- 
gated. Frustrated models with a sign problem have been simulated by making use of 
the increased precision [42, 107, 160] and by removing the sign problem in a semifrus- 
trated case[32]. Spin chains with quantum phonons have been investigated both in 
first [185] and second quantization [139, 186, 187]. Transport properties have recently 
become accessible in spin chains [118, 188]. 

For the quantum XY-model, accessibility of winding number fluctuations has al- 
lowed high precision studies of the Kosterlitz-Thouless transition (including a model 
with site disorder [110]) via the jump in helicity [100, 101]. 

With Ising anisotropy, magnetic field driven transitions could be studied in two 
and three dimensions for moderately large fields [102]. Griffiths-McCoy singularities 
have become accessible for the random transverse Ising model (section 4.4) in two 
dimensions [17, 189, 190]. 

For higher spin representations, correlations in one [37,38] and two [35,163] di- 
mensions and for ladders [98] have been analyzed with very high precision, including 
computation of the spin-1 Haldane gap to 5 digits [37]. The effects of bond dimeriza- 
tion [169,191-193] including a new order parameter [170,171], spatially anisotropic 
coupling [155,169], bond disorder [36,194,195], spin-| impurities [64,140] in 5" = 1 
chains, and additional biquadratic interaction [14, 39] have been investigated, as well 
as a quantum phase transition from site dilution on the square lattice [151]. 

The t-J model has been simulated [41-44, 120-122, 196] in 1 and 2 dimensions with 
up to 2 holes, examining thermodynamic, spectral, and magnetic properties, includ- 
ing two-hole t-J model simulations at previously unaccessibly low temperatures. The 
Hubbard model has been simulated in Id [40] and recently in detail with nearest neigh- 
bor Coulomb interaction and SSE in 2d [45], A non-standard Hubbard-likc model, 
suitable for merons, has been shown to have a KT transition [27-29, 31]. Merons were 
also used to study the tV model and its phase transitions [20, 21, 23, 24, 26]. 

Other applications include the roughening transition in classical spin models [72, 
73], spin-orbit coupling [197,198], relativistic gauge theory [123] (sec section 4.7), 
hardcore bosons with disorder [110], the physics of spins in ordered and fluctuating 
striped systems [157, 158, 199] and a new method to calculate the free energy of a 
quantum system [200]. 

7 Conclusions 

The loop algorithm and its generalizations have opened up exciting new oppor- 
tunities. Many of them remain to be investigated. A summary of advantages and 
limitations of the loop approach has been given in the introduction. For models in 
which it can be applied without sign problem and without overly big global weights. 
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it offers large benefits. Some examples were given in figure 8, section 2.15, and in 

figure 11, section 4.7. For models with sizeable fields. Directed Loops have become 
available. Last, but not least, the mapping to a combined spin and loop model that 
is the basis of the loop algorithm is intruiging on the theoretical side. 
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Appendices 

For correct Monte Carlo simulations it is essential, yet often neglected, that con- 
vergence and statistical errors are properly determined. To facilitate this task we pro- 
vide a prescription. The requirements of detailed balance and ergodicity are briefiy 
summarized in Appendix A. Appendix B discusses autocorrelations and their in- 
crease in physically interesting situation, which can drastically increase the necessary 
simulation times. The loop algorithm was designed to overcome this problem. Au- 
tocorrelations, especially very large ones, can easily can be overlooked and can be a 
serious problem in practice, causing (even drastically) incorrect results. Appendix C 
then describes how to properly ensure convergence and how to calculate correct error 
estimates. 

A Detailed Balance and Ergodicity 

There are several excellent reviews of the Monte Carlo approach, e.g. in refs. 
[60,201-203]. Here we briefly summarize some properties which we need elsewhere. 
The Monte Carlo procedure in classical statistical physics allows stochastic evaluation 
of expectation values 

(O)^^ Y: 0{S)W{S) (A.l) 

Se{S} 
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with respect to the partition function Z — J2se{S} W{S) and the phase space {S}, by 
generating a Markov chain of configurations 5(2), 5(3), which is distributed 
hke W{S). Therefore one can compute (O) from a sample of configurations 

1 " 

{0)= hm -Y,0{S^k+i)) . (A.2) 

1=1 

(In practice, the first k > Texp configurations should be discarded to allow "ther- 
malization" into the Boltzmann distribution. See appendix C.) A set of sufficient 
conditions to achieve this distribution is 

(1) Detailed Balance: The transition probability < p(5(j) — 5(^+1)) < 1 of the 
Markov chain satisfies W{S) p{S^S') = W{S') p{S'^S). 

(2) Ergodicity: Every configuration S e {S} can be reached from every other con- 
figuration with finite probability in a finite number of steps. 

Solutions for detailed balance are for example the Metropolis probability [204] 

p(5^5')=max(l,^) (A.3) 
and the heat bath like probability 

'-f^^^') = W(S) + W(S') + const ■ 

It is often advantageous, as it is for the loop algorithm, to spht the weight 1^(<S) 
into two parts: 

W{S) = Wi{S) ■ W2iS) . (A.5) 

Let pi{S — > S') be a transition probability that satisfies detailed balance with 
respect to Wi. We get a Monte Carlo procedure for W by using W2 as a "filter" 
to accept or reject S'. More precisely: First apply pi to propose a Markov step 
<S — > <S'. Then decide with a probabihty Paccept{S — > S') whether to take S' as the 
next configuration in the Markov chain. Otherwise keep S. Here Paccept only needs to 
satisfy detailed balance between S and S' with respect to W2, 

W2(S) PacceptiS^S') - W2{S') Paccept{S' ^ S) . (A.6) 

One can easily see that the overall update satisfies detailed balance with respect to 
W. For Paccept we can for example choose the heatbath probability Paccept{S -^S') = 
W2{S')/{W2{S) + W2{S')). Ergodicity has to be shown separately for the overall 
procedure. 
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B Autocorrelations and Critical Slowing Down 

Successive configurations 5(2), 5(3), ... in the Markov chain of a Monte Carlo 
configuration are correlated. Here we discuss the corresponding autocorrelation times, 
which can be extremely large. For other treatments of this topic, see e.g. references 
[60,201,203,205]. We follow refs. [60] and [40] in shghtly simphfied form. Define a 
Monte Carlo average from n measurements 

— 1 " 

o-.^-y: (B.i) 

and define the autocorrelation function for the observable O 
Coo{t) (0(,)0(.+*))-(0(.))(0(^+t)) 

Cooit) := I Er=i { {O^i) - \ Eti ^«)) {Oa^t) - i Er=i } 
with the normalized version 



(B.2) 



Typically, Too{t) is convex and will decay exponentially at large t like e^'*'/^. Define 
the exponential autocorrelation time for the observable O by this asymptotic decay 

rZv ■= sup ,,, . (B.4) 



t-^ -log |roo(i)| 



This is the relaxation time of the slowest mode in the Monte Carlo updates which 
couples to O. The slowest overall mode is 



^exp 



sup (rfj (B.5) 



and corresponds to the second largest eigenvalue of the Markov transition matrix. 
(The largest eigenvalue is 1 and has the Boltzmann distribution as eigenvector). 

If the 0{i) were statistically independent, then the error estimate of O would be 
a/ \/n with 



a' = —-rCoo{^) . (B.6) 
Instead, the statistical error of O is controlled by the integrated autocorrelation time 

-| CXD 

rZt--=^ + Y.^oo{t) (B.7) 

and becomes Ointj \fn with [203] 

^ 2r^, Coo(O) forn » r^^, . (B.8) 
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Therefore a Monte Carlo run of n measurements effectively contains only nj i^rf^i) 
independent samples for measuring (O). If Taa{t) is a single exponential e^*/^<=^p. i.e. 
if only a single mode of the Markov transition matrix couples to O, then t^^. = r^^, 
otherwise rg^ < rg^. 

In simulations of classical statistical systems, autocorrelation times typically grow 

like 

rZ,e.p-^<L,^r^nUio)^ (B.9) 

where L is the linear size of the system, ^ is the physical correlation length in the 
infinite volume limit at the same couplings, and called the (Monte-Carlo) 

dynamical critical exponent. In general, z^'" depends on the observable O, and 
^i^fi^) 7^ ^^pi^)- (Note that rg^ is a correlation time, whereas rg^ resembles the 
corresponding "susceptibility" ; they will in general have different critical behavior.) 
For local unguided updates, including the case of local updates in the determinental 
formalism, one has so far always found 



MCJocal 



2 . (B.IO) 



T ~ 



The intuitive reason is that changes in a configuration have to spread over a distance 
min(L,^) in order to provide a statistically independent configuration. With local 
updates, this spread resembles a random walk with step size one [206], which needs 
steps to travel a distance r. For local overrelaxed updates (see ref. [60] and section 
5.1), z^'-^ can be as small as 1. 

In nonrelativistic quantum simulations, space and imaginary time are asymmetric. 
With local Monte Carlo updates one can expect 

jmax ( min(L, , ^ min(/?, ^) ) | , (B.ll) 

where now L and ^ are spatial lengths, iiji^r = Lt is the temporal extent of the 
lattice, A is the energy gap, and 1/(AAt) is the temporal correlation length. Again 
we need to distinguish Tg^ from r£p, and again z^^ ^ 2 for local updates. 

Close to phase transitions (.^-^oo) or at low temperatures and small gaps (A— s>0), 
the autocorrelation times of local algorithms, with 2;^*-^ >,2, will grow very fast. Away 
from a phase transition, at finite ^ and A, r should reach a constant value for large 
L and p. In addition, one needs to take the limit At— >0. For algorithms in discrete 
time, this results in a large factor l/(Ar)''^"'"^'' in required computer time. (We get 
"(z + 1)" since each MC sweep has to update (3/ At timeslices). With the loop 
algorithm, on the other hand, critical slowing down often disappears: z'^'-^ ~ 0, and 
in continuous time, the factor 1/Ar is replaced by a constant of order 1. 



C Convergence and Error Calculation 



Since MC measurements are correlated (see appendix B), it is not at all trivial to 
calculate correct statistical errors, or even to ensure convergence of a MC simulation 
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(see also figure 11). For error calculations, there are two good strategies in practice, 
binning and Jackknife (or the related bootstrap method), which can also be combined. 
Let us emphasize that to ensure convergence, it is indispensable to begin simulations 
on extremely small systems and at unproblematic parameters (e.g. high temperature), 
and to slowly increase system size, while monitoring autocorrelations through binning 
and/or a thorough analysis of the time series and its autocorrelation function F for 
all observablcs to be measured (preferrably for all suspect observables) . Otherwise, 
if one starts on too big a system, the simulation can all too easily be locked in some 
region of phase space, without any detectable signal in measured autocorrelations, but 
with possibly entirely wrong results. In practice a very good instrument for detecting 
moderately long autocorrelations is to just look at time series of various observables. 
This way one can notice long time scales with far less statistics than necessary in the 
analysis of autocorrelation functions. 

Remarkably, it has been discovered that it is possible to perform Monte Carlo sim- 
ulations with exact convergence, generating completely independent configurations 
from the exact desired Boltzmann distribution, although with rather large compu- 
tational effort. Such an Exact Monte Carlo was developed as "Coupling From The 
Past" (CFTP) by Propp and Wilson [207]. An extension has been proposed for for 
Swendsen- Wang- like cluster algorithms by M. Huber [207], but appears to be too 
slow. 

Binning: (We follow ref. [40]). Group the n measurements 0{i) = into k 
bins of length I — n/k with (e.g.) Z = 2, 4, 8, ... . Compute k bin averages 



1 



bl 



Ob{l):^j E ,b^l,..,k (C.l) 

* i={b-l)l+l 

and the variance of these averages 

'^'(0:=r^E(^6(0-^)' ■ (C.2) 

This variance should become inversely proportional to / as the bin size / becomes 
large enough, whence the Ob{l) as a function of b become statistically independent 
[205]. 

The expectation value of the quantity 

-fn.(0 - ^ , (C.3) 

where a is given by eq. (B.6), grows monotonically in I. When statistical indepen- 
dence is approached, T^f{l) approaches the integrated autocorrelation time r-*^^ from 



below. The converged asymptotic value of T^^f{l) (or rather its expectation value) can 
therefore be used in eq. (B.8) to compute the actual statistical error of O. Note that 
T^t{l) will start to fluctuate at large I, since for flnite number of measurements the 
number of bins k becomes small. 
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Convergence: If T^t{l) does not converge, then its expectation value at the 
largest available / is a lower bound for r^^, giving a lower bound for the error of O. 
In that case the MC run has not converged, and the data cannot be used to deduce 
physical results for (O). Convergence of r^til) is a prerequisite for using the MC 
results. Since varies for different observables O, t^i{1) niay have converged for 
some O, and not for others. This is a dangerous situation, since the very slow modes 
visible in the nonconverged observables may be relevant for the apparently converged 
observables, too. Moreover, before starting measurements, the Monte Carlo config- 
uration must be allowed to thermalize, i.e. to approach the Boltzmann distribution. 
It can be shown that the thermalization (from an arbitrary starting configuration) 
is governed by the overall exponential autocorrelation time Tg^p, i-e. the very largest 
time scale in the simulation. The thermalization time needs to be a reasonably large 
multiple of r^^p. Therefore it is necessary to have at least an upper bound on r^xp 
available. If an insufficient time is spent on thermalization, then the MC averages O 
contain a systematic bias, and will converge more slowly. A good approach in practice 
is to spend 10 — 20% of the total simulation time on thermahzation, which still does 
not noticeably increase statistical errors. 

An unfortunate problem in practice is that simulations may be started on an overly 
big system, for which - unbeknownst to the simulator - there are huge autocorrelation 
times. Then it may happen that within any feasible MC run, these large time scales 
remain invisible, so that the MC run appears to have converged, whereas in reality 
it has barely moved in phase space and the results may be completely wrong. (Take 
for example the simulation of a simple Ising model with a local algorithm at low 
temperature. The total magnetization takes an exponentially large time to change 
sign. It may never do so during the simulation, and may appear converged at a 
large finite value, whereas the true average magnetization for a finite system is zero.) 
Unless an "Exact Monte Carlo" method [207] can be used, the only way to avoid 
this problem is apparently to begin simulations on extremely small systems and away 
from problematic parameter regions, so that convergence is guaranteed by brute force. 
Slowly increasing system size, while measuring autocorrelation times, one can ensure 
that autocorrelations do not get out of hand. Note that this approach does not require 
much additional computer time, since simulations on small systems will be fast. 

One rather sensitive and simple instrument to detect some autocorrelations long 
before they arc visible in a binning analysis is to simply plot the MC evolution 
0{i), i = 1, ...,n graphically and to look for long correlations by eye. 

Autocorrelation function: A quantitative analysis of autocorrelations beyond 
Tj^j, e.g. in order to calculate r£p, requires calculation of the autocorrelation function 
^oo{t). This is feasible only when O has converged. Contrary to claims in the 
literature, one cannot reliably extract the integrated autocorrelation time from 
^ooit) by neglecting it beyond a "window" t < W selfconsistently determined from 
the slope of TooiW). Typically (but simplified [60]), we have 



j 



(C.4) 
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with contributions from all eigcnmodcs of the Markov transition matrix, unless they 
are orthogonal to O (whence = 0). Therefore r^^^ ^ Y^jcf^j can get sizeable 
contributions from very large time scales Tj, even when they couple only with small 
matrix elements and are therefore not visible at small times t. This does indeed 
commonly happen in practice. A more reliable procedure is the following: Ensure 
convergence of O. Calculate Too{t) for t < tmax: where t^ax is chosen as large as 
possible while Too{t) remains well above zero within error bars for all t < tr, 



^max • 



Compute estimates for and its matrix element from the (hopefully) asymp- 
totic decay of Tooi^)- Calculate from eq. (B.7) by summing t up to the order 
of and computing the remainder of from the asymptotic form of eq. (C.4), 

Too{t) ~ c^pe~*/'^«^p . Of course, even this procedure will fail if the MC run is too 
short to show the largest autocorrelation times. 

Jackknife: A binning-type analysis is a prerequisite for checking convergence. 
It also produces values for the autocorrelation times t^^^. However, standard error 
propagation becomes rather cumbersome for nonlinear quantities, like correlation 
functions in simulations with a sign-problem. It is much easier to compute errors 
with the jackknife procedure [208]. We give a brief recipe. Split the measured values 
0{i) into k groups of length I = n/k. To obtain the asymptotic error, I must be 
significantly larger than the relevant autocorrelation time r^^^. 

Now perform the complete, possibly highly nonlinear, analysis of the MC-data 
k + 1 times: first with all I ■ k data, leading to a result "i?^'^)", then, for j — 1, .., k, 
with all data except those in bin j (i.e. pretend that bin j was never measured), 
leading to values "i?*^-')". Then the overall result R is 

R = i?(o) - Bias, where 

Bias = (fc-l)(i?'^^-i?(°)), (C.5) 

ryav _ 1 y^fc ry{j) 



with statistical error 



1/2 

6{R) = {k- ly/^ I y V(i?(^'))' - (i?"")' 1 ■ (C.6) 




In this procedure, error propagation is automatic. In each of the k+1 analyses, almost 
the full set of data is used, avoiding problems in the usual analysis like instabilities of 
fits. It is also possible to combine Jackknife and binning by repeating the Jackknife 
procedure for different bin lengths, to check for convergence and to compute integrated 
autocorrelation times and asymptotic errors according to eq. (C.3). 
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